Skip to content

[FLINK-38218] Fix MySQL CDC binlog split metadata split transmission #4087

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

morozov
Copy link
Contributor

@morozov morozov commented Aug 8, 2025

The root cause is that the binlog split metadata transfer protocol relies on the order of finished snapshot split infos to be stable and corresponding to the order of split assignment (the infos of newly added/snapshotted tables are appended to the end of the list). However, when MySqlSnapshotSplitAssigner is restored from state, assignedSplits are reordered, which breaks this assumption.

Change summary

  1. Require assigned snapshot splits to be ordered. This isn't strictly necessary to fix the bug but follows directly from the JavaDoc I added to MySqlSnapshotSplitAssigner#assignedSplits. If the order is important, the type should guarantee that it's preserved. Note the changes in the deserialization code. Not using an ordered map there while the order is important may cause other hard to diagnose issues.
  2. Rely on stable order of assigned splits. Instead of identifying duplicate received split infos by split ID, ignore the first N elements that we know we already have.
  3. Eliminate code duplication in MySqlBinlogSplit constructors. There are currently two constructors where one doesn't call the other. The subsequent commit adds a check that needs to be enforced regardless of which of the constructors was used, so I'm combining them.
  4. Enforce no duplicate finished snapshot split infos in MySqlBinlogSplit. By design, a binlog split cannot contain duplicate finished snapshot split infos. If it does, it indicates the fact that it was constructed incorrectly. If it happens, it's a bug, and we want to fail as early as possible.

@morozov morozov marked this pull request as ready for review August 8, 2025 21:39
@morozov
Copy link
Contributor Author

morozov commented Aug 8, 2025

I'm not sure how to test this. The issue is reproducible if a source is restarted mid-snapshot of a newly added table and requires consuming the changes in the new table from the binlog. Could maintainers recommend an existing test on top of which I could build this?

@morozov morozov force-pushed the FLINK-38218-fix-binlog-split-construction branch from 94b2fb3 to 1069e6e Compare August 11, 2025 20:22
@morozov morozov force-pushed the FLINK-38218-fix-binlog-split-construction branch from 1069e6e to 9f16356 Compare August 13, 2025 23:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant