Skip to content

[FLINK-38139] Fix consecutive online schema change causes job failure. #4064

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lvyanquan
Copy link
Contributor

@lvyanquan lvyanquan commented Jul 24, 2025

Fix potential exception when online schema change happened.

Why did this happen?
Because when making table schema changes, the original table was not locked.
And the online schema change corresponds to a set of SQL statements, which were not processed completely before being issued. There may be a situation where the table structure does not correspond.

private static final Pattern OSC_TABLE_ID_PATTERN = Pattern.compile("^_(.*)_(gho|new)$");

private static final Pattern OSC_TEMP_TABLE_ID_PATTERN = Pattern.compile("^_(.*)_(del|old)$");
private static final Pattern RDS_OGT_TEMP_TABLE_ID_PATTERN =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to support RDS OGT later as it could not be tested now.

Optional<String> finishedTables =
RecordUtils.parseOnLineSchemaRenameEvent(event.getRecord());
if (finishedTables.isPresent()) {
TableId tableId = RecordUtils.getTableId(event.getRecord());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When execute gh-ost /gh-ost --user="root" --password="XXXX" --host="localhost" --database="dinky" --table="dinky_user" --alter="ADD COLUMN czy04 DATETIME NULL AFTER update_time" --allow-on-master --ok-to-drop-table --initially-drop-ghost-table --initially-drop-old-table --execute,Have this error schema can't remove on BinlogSplitReader. pendingSchemaChangeEvents

If have del table,RecordUtils.getTableId have ',' with del and finishedTable.That will error record to ddl remove

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2025-08-04 16:57:01,623 INFO  org.apache.flink.cdc.connectors.mysql.source.utils.RecordUtils  - Checking if DDL might be an OSC renaming event... rename /* gh-ost */ table `dinky`.`dinky_user` to `dinky`.`_dinky_user_del`
2025-08-04 16:57:01,624 INFO  org.apache.flink.cdc.connectors.mysql.source.utils.RecordUtils  - Determined the shorter TableId dinky_user is the renaming source.
2025-08-04 16:57:01,624 INFO  org.apache.flink.cdc.connectors.mysql.source.utils.RecordUtils  - Determined the longer TableId _dinky_user_del is the renaming target.
2025-08-04 16:57:01,624 INFO  org.apache.flink.cdc.connectors.mysql.source.utils.RecordUtils  - Renamed to TableId name _dinky_user_del matches OSC temporary TableId pattern, yield dinky_user.
2025-08-04 16:57:01,624 INFO  org.apache.flink.cdc.connectors.mysql.debezium.reader.BinlogSplitReader  - Received the ending event of  tableId:dinky._dinky_user_del,dinky_user,finishedTableId:dinky.dinky_user
2025-08-04 16:57:01,624 INFO  org.apache.flink.cdc.connectors.mysql.debezium.reader.BinlogSplitReader  - Received the ending event of table dinky.dinky_user. Emit corresponding DDL event now.
2025-08-04 16:57:01,624 INFO  org.apache.flink.cdc.connectors.mysql.debezium.reader.BinlogSplitReader  - remove pendingSchemaChangeEvents:dinky.dinky_user
2025-08-04 16:57:01,626 INFO  org.apache.flink.cdc.connectors.mysql.source.utils.RecordUtils  - Checking if DDL might be an OSC renaming event... rename /* gh-ost */ table `dinky`.`_dinky_user_gho` to `dinky`.`dinky_user`
2025-08-04 16:57:01,626 INFO  org.apache.flink.cdc.connectors.mysql.source.utils.RecordUtils  - Determined the shorter TableId dinky_user is the renaming source.
2025-08-04 16:57:01,626 INFO  org.apache.flink.cdc.connectors.mysql.source.utils.RecordUtils  - Determined the longer TableId _dinky_user_del is the renaming target.
2025-08-04 16:57:01,626 INFO  org.apache.flink.cdc.connectors.mysql.source.utils.RecordUtils  - Renamed to TableId name _dinky_user_del matches OSC temporary TableId pattern, yield dinky_user.
2025-08-04 16:57:01,626 INFO  org.apache.flink.cdc.connectors.mysql.debezium.reader.BinlogSplitReader  - Received the ending event of  tableId:dinky._dinky_user_del,dinky_user,finishedTableId:dinky.dinky_user
2025-08-04 16:57:01,626 INFO  org.apache.flink.cdc.connectors.mysql.debezium.reader.BinlogSplitReader  - Received the ending event of table dinky.dinky_user. Emit corresponding DDL event now.
2025-08-04 16:57:01,626 ERROR org.apache.flink.cdc.connectors.mysql.debezium.reader.BinlogSplitReader  - Error: met an unexpected osc finish event. Current pending events: {}, Record: DataChangeEvent [record=SourceRecord{sourcePartition={server=mysql_binlog_source}, sourceOffset={transaction_id=null, ts_sec=1754297821, file=mysql-bin.000003, pos=181321, server_id=1}} ConnectRecord{topic='mysql_binlog_source', kafkaPartition=0, key=Struct{databaseName=dinky}, value=Struct{source=Struct{version=1.9.8.Final,connector=mysql,name=mysql_binlog_source,ts_ms=1754297821468,db=dinky,table=_dinky_user_del,dinky_user,server_id=1,file=mysql-bin.000003,pos=181119,row=0},historyRecord={"source":{"file":"mysql-bin.000003","pos":181119,"server_id":1},"position

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants