-
-
Notifications
You must be signed in to change notification settings - Fork 171
Description
Description:
Background
Yorkie uses MongoDB and organizes data into three categories of collections:
- cluster-wide collections:
users
,projects
- project-wide collections (
sharding key: project_id, strategy: ranged
):documents
,clients
,schemas
- document-wide collections (
sharding key: doc_id, strategy: hashed
):changes
,versionvectors
,snapshots
Among them, project-wide and document-wide collections are frequently accessed by the SDK and handle high traffic. To accommodate this, sharding is applied to both types.
Since ObjectId
s are time-based, records created around the same time were concentrated on the same shard, causing uneven load. So we recently applied hashed sharding to doc_id
for document-wide collections(#1323). This was resolved by switching to hashed sharding, which helped distribute the data more evenly.
Current Issues
For project-wide collections, we currently use project_id
with ranged sharding, which leads to similar issues when the number of projects is small. In such cases, traffic tends to concentrate on a few shards.
This becomes especially problematic when a few high-traffic projects dominate the workload, potentially creating hotspots and performance bottlenecks.
Proposal
Collections like clients
, documents
, and schemas
include fields such as key
or name
that could be used in combination with project_id
as sharding keys to better distribute the data across shards.
However, we need to analyze the impact on existing queries, especially those that rely on filtering by project_id
, to avoid introducing regressions.
clients
:FindDeactivateCandidatesPerProject
documents
:FindCompactionCandidatesPerProject
,FindDocInfosByPaging
,FindDocInfosByQuery
schemas
:GetSchemaInfos
,ListSchemaInfos
Validation Plan
- We can run a MongoDB shared cluster on your local machine(https://github.com/yorkie-team/yorkie/tree/main/build/docker/sharding).
- We already have a load testing tool in place to simulate traffic(https://github.com/yorkie-team/yorkie/tree/main/test/k6).
- We plan to use MongoDB Compass to monitor per-shard load, latency, and distribution in real time.
- By comparing performance before and after the sharding key changes, we aim to validate the effectiveness of the new strategy with quantitative benchmarks.
TODO
- Identify candidate sharding keys:
project_id + key
,project_id + name
, etc. - Analyze current query patterns and indexes:
- How often is
project_id
used as a filter? - Will compound
hashed
keys impact query performance?
- How often is
- Use the load testing tool to simulate realistic scenarios.
- Use MongoDB Compass to monitor shard utilization and latency.
Evaluate whether the new sharding strategy can be applied to production environments safely.
Informations:
A. sh.status
result
test> sh.status()
shardedDataDistribution
[
{
ns: 'yorkie-meta.versionvectors',
shards: [
{
shardName: 'mongodb-shard-0',
numOrphanedDocs: 0,
numOwnedDocuments: 82,
ownedSizeBytes: 16236,
orphanedSizeBytes: 0
},
{
shardName: 'mongodb-shard-1',
numOrphanedDocs: 0,
numOwnedDocuments: 129,
ownedSizeBytes: 43344,
orphanedSizeBytes: 0
}
]
},
{
ns: 'yorkie-meta.clients',
shards: [
{
shardName: 'mongodb-shard-1',
numOrphanedDocs: 0,
numOwnedDocuments: 347247,
ownedSizeBytes: 86811750,
orphanedSizeBytes: 0
}
]
},
{
ns: 'yorkie-meta.documents',
shards: [
{
shardName: 'mongodb-shard-1',
numOrphanedDocs: 0,
numOwnedDocuments: 26059,
ownedSizeBytes: 5263918,
orphanedSizeBytes: 0
}
]
},
{
ns: 'yorkie-meta.changes',
shards: [
{
shardName: 'mongodb-shard-0',
numOrphanedDocs: 0,
numOwnedDocuments: 225452,
ownedSizeBytes: 497798016,
orphanedSizeBytes: 0
},
{
shardName: 'mongodb-shard-1',
numOrphanedDocs: 0,
numOwnedDocuments: 341670,
ownedSizeBytes: 606464250,
orphanedSizeBytes: 0
}
]
}
]
---
databases
{
database: {
_id: 'yorkie-meta',
primary: 'mongodb-shard-1',
partitioned: false,
version: {
uuid: UUID('21c6b931-7b4d-4285-ad97-c998f6d42c6f'),
timestamp: Timestamp({ t: 1706613414, i: 1 }),
lastMod: 1
}
},
collections: {
'yorkie-meta.changes': {
shardKey: { doc_id: 'hashed' },
unique: false,
balancing: true,
chunkMetadata: [
{ shard: 'mongodb-shard-0', nChunks: 2 },
{ shard: 'mongodb-shard-1', nChunks: 2 }
],
chunks: [
{ min: { doc_id: MinKey() }, max: { doc_id: Long('-4611686018427387902') }, 'on shard': 'mongodb-shard-0', 'last modified': Timestamp({ t: 1, i: 0 }) },
{ min: { doc_id: Long('-4611686018427387902') }, max: { doc_id: Long('0') }, 'on shard': 'mongodb-shard-0', 'last modified': Timestamp({ t: 1, i: 1 }) },
{ min: { doc_id: Long('0') }, max: { doc_id: Long('4611686018427387902') }, 'on shard': 'mongodb-shard-1', 'last modified': Timestamp({ t: 1, i: 2 }) },
{ min: { doc_id: Long('4611686018427387902') }, max: { doc_id: MaxKey() }, 'on shard': 'mongodb-shard-1', 'last modified': Timestamp({ t: 1, i: 3 }) }
],
tags: []
},
'yorkie-meta.clients': {
shardKey: { project_id: 1 },
unique: false,
balancing: true,
chunkMetadata: [ { shard: 'mongodb-shard-1', nChunks: 1 } ],
chunks: [
{ min: { project_id: MinKey() }, max: { project_id: MaxKey() }, 'on shard': 'mongodb-shard-1', 'last modified': Timestamp({ t: 1, i: 0 }) }
],
tags: []
},
'yorkie-meta.documents': {
shardKey: { project_id: 1 },
unique: false,
balancing: true,
chunkMetadata: [ { shard: 'mongodb-shard-1', nChunks: 1 } ],
chunks: [
{ min: { project_id: MinKey() }, max: { project_id: MaxKey() }, 'on shard': 'mongodb-shard-1', 'last modified': Timestamp({ t: 1, i: 0 }) }
],
tags: []
},
'yorkie-meta.versionvectors': {
shardKey: { doc_id: 'hashed' },
unique: false,
balancing: true,
chunkMetadata: [
{ shard: 'mongodb-shard-0', nChunks: 2 },
{ shard: 'mongodb-shard-1', nChunks: 2 }
],
chunks: [
{ min: { doc_id: MinKey() }, max: { doc_id: Long('-4611686018427387902') }, 'on shard': 'mongodb-shard-0', 'last modified': Timestamp({ t: 1, i: 0 }) },
{ min: { doc_id: Long('-4611686018427387902') }, max: { doc_id: Long('0') }, 'on shard': 'mongodb-shard-0', 'last modified': Timestamp({ t: 1, i: 1 }) },
{ min: { doc_id: Long('0') }, max: { doc_id: Long('4611686018427387902') }, 'on shard': 'mongodb-shard-1', 'last modified': Timestamp({ t: 1, i: 2 }) },
{ min: { doc_id: Long('4611686018427387902') }, max: { doc_id: MaxKey() }, 'on shard': 'mongodb-shard-1', 'last modified': Timestamp({ t: 1, i: 3 }) }
],
tags: []
}
}
}
Why:
Metadata
Metadata
Assignees
Labels
Type
Projects
Status