Skip to content

Conversation

tanjialiang
Copy link
Contributor

@tanjialiang tanjialiang commented Apr 8, 2025

Description

issue #24510

The join might have a huge right side in cases of following optimization:
When left join has the 'is null' key filter on the right side, it is effectively making the query return rows from the left side where there is a no match on the right side. This currently is optimized by converting the section into a left semi join. But the current issue is the right side (key only) might have large amount of duplication that is completely unnecessary for evaluation of the join. This PR addresses it by adding a distinct aggregation operator before to optimize the performance

Selective Meta internal production queries showed 100x performance gain and 3x memory reduction

== RELEASE NOTE ==
General Changes
Add distinct on the right side optimization on optimizer rule ``LeftJoinNullFilterToSemiJoin``.

@kaikalur
Copy link
Contributor

kaikalur commented Apr 8, 2025

Fix the title to be more specific - add distinct to the right side when LOJ + IS NULL is rewritten to Semijoin

@tanjialiang tanjialiang changed the title Add distinct semi join plan optimization Add distinct to the right side when LOJ + IS NULL is rewritten to Semijoin Apr 8, 2025
@kaikalur
Copy link
Contributor

kaikalur commented Apr 8, 2025

Also this only partially addresses the issue linked because there could be other direct uses of semijoin that we don't address. This just improves on the original optimization.

@kaikalur
Copy link
Contributor

kaikalur commented Apr 8, 2025

Add couple more tests:

  1. already distinct on the right side (and make sure you don't get double distinct - not incorrect but just unncessary)
  2. add group by like: .. (select custkey from orders group by custkey having count(1) > 5) and see if you get the correct plan

@tanjialiang tanjialiang force-pushed the semi_join branch 2 times, most recently from d1e556a to 98c91f4 Compare April 8, 2025 21:38
kaikalur
kaikalur previously approved these changes Apr 8, 2025
Copy link
Contributor

@kaikalur kaikalur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kaikalur
Copy link
Contributor

kaikalur commented Apr 8, 2025

@jaystarshot - please take a look. Thanks!

jaystarshot
jaystarshot previously approved these changes Apr 9, 2025
Copy link
Member

@jaystarshot jaystarshot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please also add a release note?

@jaystarshot
Copy link
Member

This only applies an agg if the previous join is a Left join, isn't it better to apply this to all semi joins for a general case (as mentioned in the issue) ?

@tanjialiang
Copy link
Contributor Author

This only applies an agg if the previous join is a Left join, isn't it better to apply this to all semi joins for a general case (as mentioned in the issue) ?

Hi @jaystarshot, thanks for your review. Yeah we are going to follow up on extracting this out as a more general case support as a followup. We will keep it updated in the linked issue.

@steveburnett
Copy link
Contributor

Thanks for the release note! Some formatting nits.

== RELEASE NOTE ==

General Changes
Add distinct on the right side optimization on optimizer rule ``LeftJoinNullFilterToSemiJoin``.

@tanjialiang
Copy link
Contributor Author

Hi @jaystarshot could you help with another stamp after rebase? Thank you

@tanjialiang tanjialiang requested a review from jaystarshot April 10, 2025 18:27
@tanjialiang tanjialiang merged commit a972bba into prestodb:master Apr 10, 2025
97 checks passed
@ZacBlanco ZacBlanco mentioned this pull request May 29, 2025
21 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:Meta PR from Meta
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants