In the iceberg equality delete as join solution, why do we need to use the partition field as the join key? #25198
-
I noticed that in this implementation, in addition to using data_sequence_number and equality_ids, the partition field is also used as the join key. Why is that? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Follow the Iceberg specification, an equality delete file must be applied to a data file when all of the following are true:
So we should add partition fields as the join key as well to ensure that the data files and the delete files applied to them located in the same partition. Referring to here. Hope this can help you. |
Beta Was this translation helpful? Give feedback.
-
oh, yes, you do me a big favor. Thanks |
Beta Was this translation helpful? Give feedback.
Follow the Iceberg specification, an equality delete file must be applied to a data file when all of the following are true:
So we should add partition fields as the join key as well to ensure that the data files and the delete files applied to them located in the same partition.
Referring to here. Hope this can help you.