Skip to content

CBG-4267 implement LWW conflict resolution for blip tester #7657

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Aug 13, 2025
Merged

Conversation

torcolvin
Copy link
Collaborator

@torcolvin torcolvin commented Aug 1, 2025

CBG-4267 implement conflict resolution for blip tester

  • only implements LWW
    • put the incoming rev + existing CBL rev into mv
    • create new CBL rev
    • mark last replicated version as the incoming rev, so the new rev is replicated to SG
  • Rename HybridLogicalVector.Equals to HybridLogicalVector.Equal to match go idioms
  • Update proposeChanges message to include full HLV to be able to push the version back to SG
  • It is OK for sendRev to return a 409, do not cause blip tester to fail. CBL getting a 409 is expected until CBL resolves the version locally.

This was tested with the topology tests, but it does not fully work. There's a chance the implementation needs to be improved, but this is a good start. The only test that hits this code pathway for conflicts is the new test I wrote, the topology tests are currently skipped.

Pre-review checklist

  • Removed debug logging (fmt.Print, log.Print, ...)
  • Logging sensitive data? Make sure it's tagged (e.g. base.UD(docID), base.MD(dbName))
  • Updated relevant information in the API specifications (such as endpoint descriptions, schemas, ...) in docs/api

Integration Tests

@Copilot Copilot AI review requested due to automatic review settings August 1, 2025 13:37
Copilot

This comment was marked as outdated.

@torcolvin torcolvin requested a review from Copilot August 5, 2025 18:21
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements Last-Write-Wins (LWW) conflict resolution for the BLIP tester client to better simulate Couchbase Lite behavior during testing. The implementation adds conflict detection and resolution capabilities when the client receives revisions that conflict with local versions.

Key changes:

  • Implements LWW conflict resolution algorithm that compares version values to determine the winner
  • Adds conflict resolution infrastructure including winner determination and HLV merging logic
  • Updates BLIP message handlers to support conflict resolution workflows

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
rest/utilities_testing_blip_client.go Core implementation of LWW conflict resolution, message handler updates, and HLV management
rest/blip_api_crud_test.go New test case to verify conflict resolution behavior
db/hybrid_logical_vector.go Utility methods for HLV operations and debug output
db/util_testing.go Minor method name update for consistency
topologytest/multi_actor_no_conflict_test.go Skip CBL tests due to matching CV but different PVs
Comments suppressed due to low confidence (1)

rest/utilities_testing_blip_client.go:46

  • [nitpick] The string value "lww" is quite abbreviated. Consider using a more descriptive value like "last_write_wins" for better clarity.
	ConflictResolverLastWriteWins BlipTesterClientConflictResolverType = "lww"

torcolvin and others added 3 commits August 5, 2025 14:22
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@@ -276,6 +299,50 @@ func (cd *clientDoc) _proposeChangesEntryForDoc() *proposeChangeBatchEntry {
return &proposeChangeBatchEntry{docID: cd.id, version: latestRev.version, revTreeIDHistory: revTreeIDHistory, hlvHistory: latestRev.HLV, latestServerVersion: cd._latestServerVersion, seq: cd._latestSeq, isDelete: latestRev.isDelete}
}

// _resolveConflict will resolve a conflict for the given incoming version. This will mutate the hlv, and return the expected body and version to write.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually I'd consider "resolving the conflict" as a separate step from "detecting a conflict", like the way we do for ISGR (see db.IsIllegalConflict and db.resolveConflict). I think this is doing both, and would be easier to follow if the two tasks were separated out. Let me know if there's something about HLVs that makes it more logical to do both at once.

if localCV.Equal(incomingCV) {
require.FailNow(t, fmt.Sprintf("incoming CV %#+v is equal to local CV revision %#+v - this should've been filtered via changes response before ending up as a rev", incomingCV, r))
}
if incomingCV.SourceID == localCV.SourceID {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should also check incomingCV.Value == localCV.Value. I think you're basing this on that if the SourceIDs are the same that this isn't actually a conflict and we shouldn't be invoking this code in the first place, but as in the comment above I think there are race conditions between the changes response and the rev message where either the incomingCV or the localCV should be the winner, and we need to wait for the conflictResolver to make that decision.

if localCV.Equal(incomingCV) {
require.FailNow(t, fmt.Sprintf("incoming CV %#+v is equal to local CV revision %#+v - this should've been filtered via changes response before ending up as a rev", incomingCV, r))
}
if incomingCV.SourceID == localCV.SourceID {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should also check incomingCV.Value == localCV.Value. Possibly you've omitted that based on the idea that if the SourceIDs are the same that this isn't actually a conflict and we shouldn't be invoking this code in the first place? But otherwise if the two revisions are by the same source, I think we'd always want to pick the one with the larger Value, which isn't necessarily the remote (for the same race conditions described above)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the right behavior for:

  • incomingCV.sourceID == localCV.sourceID && incomingCV.Value == localCV.Value
    • Is it possible that these would be equal but the pv or mv are not equal?
  • incomingCV.sourceID == localCV.sourceID && incomingCV.Value < localCV.Value
    - Is it possible that these would be equal but the pv or mv are not equal?

I propose to fail right now because this is a condition isn't possible in the current implementation and to implement this in a future ticket.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first case, it's the same revision and it's not a conflict (regardless of whether PV or MV are equal due to asymmetric pruning, etc).
For the second case this isn't a conflict (local dominates incoming).

@@ -1260,6 +1332,11 @@ func (btcc *BlipTesterCollectionClient) sendRev(ctx context.Context, change prop
btcc.sendRev(ctx, change, false)
return
}
if errorCode == strconv.Itoa(http.StatusConflict) {
// If there is a conflict created between the proceeding proposeChanges and this rev message.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think some info/debug logging for this case is warranted so that this scenario is more visible in test output? I feel like in many cases it would be unexpected?

docRev.version = opts.incomingVersion
if btcc.UseHLV() {
// Add the incoming HLV to the local HLV, regardless of winner
require.NoError(btcc.TB(), docRev.HLV.AddNewerVersions(opts.incomingHLV))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a non-conflict case, I think we just be using the incomingHLV as the HLV, as we've determined that localRev.HLV is already included in its causal history.

There's a corner case where there has been asymmetric pruning of the HLV on client and server, but I don't think that's tested by the BlipTestClient. And even if it was, and we wanted to preserve those, I think we'd want to restore those by doing something like incomingHLV.AddNewerVersions(docRev.HLV).

More generally, I think this is going to be harder to maintain/debug if the non-conflict code path and the "conflict, remoteWins" cases are the same - I think this goes back to not using 'pickConflictWinner' unless there's actually a conflict.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we still need to write MV in the case of an actual conflict where we pick the remote as winner, right? That doesn't look like it's being handled here.

var updatedHLV *db.HybridLogicalVector
if hasLocalDoc {
localRev = doc._latestRev(btcc.TB())
winner = localRev._pickConflictWinner(btcc.TB(), opts.incomingVersion.CV, btcc.parent.ConflictResolver)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still isn't really separated out in the way I was thinking. I was suggesting to separate "detecting a conflict" from "resolving a conflict", where:

  • "detecting a conflict": Do the incoming and local HLVs represent a conflict?
  • "resolving a conflict": Running the conflict resolver to compute the winner and the new body (if merge), and computing the combined HLV

Here "resolving a conflict" is only run if it's actually a conflict, otherwise it's the normal update case.

// - add the merge sourceIDs with the incoming version and local version
// - update the new CV
delete(docRev.HLV.PreviousVersions, opts.incomingHLV.SourceID)
delete(docRev.HLV.PreviousVersions, btcc.parent.SourceID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this use localRev.HLV.SourceID and not btcc.parent.SourceID?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what we care about removing the sourceID that comes from merge versions and the sourceID from cv (btcc.parent.SourceID)

doc._latestServerVersion = opts.incomingVersion
}
// if we resolved a conflict, then we might need to push this conflict back
if winner == localDocWinner {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the above, localDocWinner isn't the only possibility when resolving a conflict.

@adamcfraser adamcfraser assigned torcolvin and unassigned adamcfraser Aug 5, 2025
@@ -500,6 +555,7 @@ func (btr *BlipTesterReplicator) handleChanges(btc *BlipTesterClient) func(*blip
knownRevs[i] = []interface{}{} // sending empty array means we've not seen the doc before, but still want it
continue
} else if localHLV.DominatesSource(changesVersion) {
base.DebugfCtx(ctx, base.KeySGTest, "Skipping changes for incoming doc %q with rev %d@%s as we already have a newer version %#+v", docID, changesVersion.Value, changesVersion.SourceID, localHLV)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I want to change rev %d@%s

Copy link
Collaborator

@adamcfraser adamcfraser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me. No functional concerns, just a question about whether FailNow is appropriate in one spot, and a clarifying question about CBL behaviour.

// NOTE: without P2P replication, this should not be the case and we would not get this revision, since CBL
// would respond to a SG changes message that CBL does not need this revision
if localHLV.DominatesSource(*incomingCV) {
require.FailNow(t, fmt.Sprintf("incoming CV %#+v has lower version than the local revision %#+v - this should've been filtered via changes response before ending up as a rev. blip tester would reply that to Sync Gateway that it doesn't need this revision", incomingCV, localHLV))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this should fail the test at this step (I think so that we can validate that changes processing is working as expected?). Wouldn't it be preferable to have this match CBL behaviour? I'm concerned that this could incorrectly handle other scenarios where the changes message isn't sufficient to identify whether local dominates incoming (maybe CBL switching clusters?).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should relax this behavior, but I'd prefer to do it in a separate PR when I can find out that is necessary and how to handle it. Right now it is an assertion of the correct behavior while development.

Until I enable tests that would cause this I do not want to relax the assertion since it is more likely to catch problems.

updatedHLV := latestLocalRev.HLV.Copy()
// resolve conflict in favor of remote document
if incomingHLV.Version > latestLocalHLV.Version {
require.NoError(btcc.TB(), updatedHLV.AddNewerVersions(incomingHLV))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is CBL not updating the merge version in this scenario? That might be reasonable, and is similar to what CBL used to do for revTrees. If so we will want to update the design doc accordingly.

@torcolvin torcolvin assigned adamcfraser and unassigned torcolvin Aug 11, 2025
adamcfraser
adamcfraser previously approved these changes Aug 12, 2025
@torcolvin torcolvin enabled auto-merge (squash) August 13, 2025 12:47
@torcolvin torcolvin merged commit 1190eac into main Aug 13, 2025
42 checks passed
@torcolvin torcolvin deleted the CBG-4267 branch August 13, 2025 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants