-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Add distinct to the right side when LOJ + IS NULL is rewritten to Semijoin #24884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
presto-tests/src/main/java/com/facebook/presto/tests/AbstractTestQueries.java
Show resolved
Hide resolved
Fix the title to be more specific - add distinct to the right side when LOJ + IS NULL is rewritten to Semijoin |
Also this only partially addresses the issue linked because there could be other direct uses of semijoin that we don't address. This just improves on the original optimization. |
Add couple more tests:
|
d1e556a
to
98c91f4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@jaystarshot - please take a look. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please also add a release note?
This only applies an agg if the previous join is a Left join, isn't it better to apply this to all semi joins for a general case (as mentioned in the issue) ? |
Hi @jaystarshot, thanks for your review. Yeah we are going to follow up on extracting this out as a more general case support as a followup. We will keep it updated in the linked issue. |
Thanks for the release note! Some formatting nits.
|
Hi @jaystarshot could you help with another stamp after rebase? Thank you |
Description
issue #24510
The join might have a huge right side in cases of following optimization:
When left join has the 'is null' key filter on the right side, it is effectively making the query return rows from the left side where there is a no match on the right side. This currently is optimized by converting the section into a left semi join. But the current issue is the right side (key only) might have large amount of duplication that is completely unnecessary for evaluation of the join. This PR addresses it by adding a distinct aggregation operator before to optimize the performance
Selective Meta internal production queries showed 100x performance gain and 3x memory reduction