-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Fix localexchange properties when spill is enabled #23922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
78ce74c
to
b1adea0
Compare
@@ -323,7 +322,16 @@ public ActualProperties build() | |||
{ | |||
List<LocalProperty<VariableReferenceExpression>> localProperties = this.localProperties; | |||
if (unordered) { | |||
localProperties = filteredCopy(this.localProperties, property -> !property.isOrderSensitive()); | |||
ImmutableList.Builder<LocalProperty<VariableReferenceExpression>> newLocalProperties = ImmutableList.builder(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Native side doesn't need any change on the query shape if the spill is enabled. So can we fix in that way? I assume the query shape without spill enabled is correct for native but only when join spill session property is enabled? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To use the same plan, join spill algorithm needs to preserver order i.e. output is the same as when no spill.
@@ -323,7 +322,16 @@ public ActualProperties build() | |||
{ | |||
List<LocalProperty<VariableReferenceExpression>> localProperties = this.localProperties; | |||
if (unordered) { | |||
localProperties = filteredCopy(this.localProperties, property -> !property.isOrderSensitive()); | |||
ImmutableList.Builder<LocalProperty<VariableReferenceExpression>> newLocalProperties = ImmutableList.builder(); | |||
for (LocalProperty<VariableReferenceExpression> property : this.localProperties) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we know G(unique) is the first in this case? and are we able to guarantee that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we do not need to guarantee this, as which one is earlier depends on other code which produce this property (in this case it's the property generation code for AssignUniqueId) and not related to this part. This code only needs to make sure that properties after removal is still valid
if (!property.isOrderSensitive()) { | ||
newLocalProperties.add(property); | ||
} | ||
else { | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what these props mean.
Here we are discarding all properties after and including the 1st order sensitive one.
Is this the intent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the intent. It also takes me some time to understand this part of logic, the property is hierarchical and if one is invalid, all following will also be invalid
b1adea0
to
0234230
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a unit test also? Maybe to TestLocalProperties
Also, for anyone else who is confused, want to share a more detailed explanation that I got from conversation with @feilong-liu:
Without spill, we have the properties G(unique), C(custkey), C(totalprice) where unique is the unique key generated by the AssignUniqueId operator. That means that we are grouped by the unique key, and within each group, custkey and totalprice are constant. (if two rows have the same value for unique, they will have the same value for custkey and total price).
when spill is enabled, we assume that the order of rows is not maintained, so we mark it as unordered (https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/sql/planner/optimizations/PropertyDerivations.java#L394). When this field is set, we were removing any order sensitive properties from the list. In this case G(unique) is considered order sensitive, so we were removing that. However, if you remove the G(unique) property, but keep the C(custkey) and C(totalprice) properties, the planner will think that custkey and totalprice are always constant, which is not true. And that's what resulted in the incorrect plan.
Instead, you need to remove all properties following the order sensitive one because the properties are hierarchical, so the following properties are defined within the context of the previous ones.
0234230
to
4ede454
Compare
Description
This is discovered by a bug, which can be replayed with the following query:
When executing this query in Presto CPP (i.e. native execution is true) and with
set session spill_enabled=true
, it will produce a wrong query plan, where the Aggregation on the top is a streaming aggregation which is wrong.After fix with this PR
After debugging, it's due to the fact that when spill is enabled, the order sensitive local properties will be removed. However, the local properties has hierarchy and cannot be simply removed. In this example, the properties are G(unique), C(custkey), C(totalprice) where unique is the unique key generated by the AssignUniqueId operator. Current logic will simply remove the G(unique) property and keeping the rest of the two. This is not right as the rest will not hold true after the first is removed.
This PR fix it by discarding the rest of properties if any previous properties are removed.
Motivation and Context
Bug fix
Impact
Fix potential incorrect plan
Test Plan
Add unit tests
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.