-
-
Notifications
You must be signed in to change notification settings - Fork 171
Description
Description:
Background
Yorkie assigns responsibility for each document to a specific server using Consistent Hashing. During PushPull and other document-related APIs, ClientInfo.Checkpoint
is frequently read and updated. Under high load(such as during Presence Load Tests) this creates significant pressure on the backing DB(clients
collection in MongoDB).
screen-capture.webm
To improve performance, especially for PushPull-related APIs, I propose introducing a write-back in-memory cache for Checkpoint
. This will reduce DB access frequency and improve response time.
Proposed Design
- Each server will maintain an in-memory cache of Checkpoint for documents it currently serves.
- The Checkpoint structure contains serverSeq and clientSeq, both of which are monotonically increasing.
- The cache will be periodically flushed to the DB in batches.
- During flush, rather than overwriting the DB value, we use MongoDB’s $max operator to merge the cached and stored values:
db.clients.updateOne(
{ $max: { "checkpoint.server_seq": ..., "checkpoint.client_seq": ... } }
);
Consistency Challenge: Stale Cache
In distributed environments (e.g., with Istio managing traffic routing), servers are unaware when document ownership changes due to scale-out/in or restarts. This can result in stale Checkpoint
values remaining in memory, which may be used if the same server later resumes ownership of the document. Naively flushing this stale data could overwrite a newer state in the DB.
One of Proposed Strategy
To strike a balance between performance and consistency, I propose the following approach:
- Flush behavior
- Use $max in MongoDB to ensure only newer values are persisted.
- Skip reading back from the DB after the flush.
- Retain the in-memory cache as-is.
- Cache invalidation:
- Use a TTL(e.g., 30 seconds) to periodically evict potentially stale entries.
- On subsequent access (e.g., new PushPull), reload from DB if the cache is missing.
Advantages
- Avoids costly read-after-write operations during flush.
- Guarantees DB consistency via $max merge.
- Prevents stale cache usage over time via TTL.
- Avoids the need for document ownership coordination in a stateless routing environment (e.g., Istio).
TODO
- Implement per-document in-memory Checkpoint cache on server
- Add periodic batch flush with $max-based DB merge
- Introduce TTL-based cache eviction policy
Related to #1001
Why:
Metadata
Metadata
Assignees
Labels
Type
Projects
Status