Skip to content

Introduce Checkpoint Cache for PushPull Optimization #1381

@hackerwins

Description

@hackerwins

Description:

Background

Yorkie assigns responsibility for each document to a specific server using Consistent Hashing. During PushPull and other document-related APIs, ClientInfo.Checkpoint is frequently read and updated. Under high load(such as during Presence Load Tests) this creates significant pressure on the backing DB(clients collection in MongoDB).

screen-capture.webm

To improve performance, especially for PushPull-related APIs, I propose introducing a write-back in-memory cache for Checkpoint. This will reduce DB access frequency and improve response time.

Proposed Design

  • Each server will maintain an in-memory cache of Checkpoint for documents it currently serves.
  • The Checkpoint structure contains serverSeq and clientSeq, both of which are monotonically increasing.
  • The cache will be periodically flushed to the DB in batches.
  • During flush, rather than overwriting the DB value, we use MongoDB’s $max operator to merge the cached and stored values:
db.clients.updateOne(
  { $max: { "checkpoint.server_seq": ..., "checkpoint.client_seq": ... } }
);

Consistency Challenge: Stale Cache

In distributed environments (e.g., with Istio managing traffic routing), servers are unaware when document ownership changes due to scale-out/in or restarts. This can result in stale Checkpoint values remaining in memory, which may be used if the same server later resumes ownership of the document. Naively flushing this stale data could overwrite a newer state in the DB.

One of Proposed Strategy

To strike a balance between performance and consistency, I propose the following approach:

  • Flush behavior
    • Use $max in MongoDB to ensure only newer values are persisted.
    • Skip reading back from the DB after the flush.
    • Retain the in-memory cache as-is.
  • Cache invalidation:
    • Use a TTL(e.g., 30 seconds) to periodically evict potentially stale entries.
    • On subsequent access (e.g., new PushPull), reload from DB if the cache is missing.

Advantages

  • Avoids costly read-after-write operations during flush.
  • Guarantees DB consistency via $max merge.
  • Prevents stale cache usage over time via TTL.
  • Avoids the need for document ownership coordination in a stateless routing environment (e.g., Istio).

TODO

  • Implement per-document in-memory Checkpoint cache on server
  • Add periodic batch flush with $max-based DB merge
  • Introduce TTL-based cache eviction policy

Related to #1001

Why:

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    In progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions