Skip to content

Conversation

feilong-liu
Copy link
Contributor

@feilong-liu feilong-liu commented Oct 31, 2024

Description

This is discovered by a bug, which can be replayed with the following query:

WITH t1 AS (
    SELECT
        MAX(totalprice) maxprice,
        MIN(totalprice) minprice,
        custkey ckey
    FROM orders
    GROUP BY
        custkey
),
t2 AS (
    SELECT
        custkey,
        totalprice,
        (
            SELECT
                maxprice
            FROM t1
            WHERE
                ckey = custkey
        ) maxprice,
        (
            SELECT
                minprice
            FROM t1
            WHERE
                ckey = custkey
        ) minprice
    FROM orders
)
SELECT
    custkey
FROM t2
WHERE
    (
        totalprice BETWEEN minprice AND maxprice
    )
GROUP BY
    custkey;

When executing this query in Presto CPP (i.e. native execution is true) and with set session spill_enabled=true, it will produce a wrong query plan, where the Aggregation on the top is a streaming aggregation which is wrong.

presto:tpch> explain (type distributed) with t1 as (select max(totalprice) maxprice, min(totalprice) minprice, custkey ckey from orders group by custkey), t2 as (select custkey, totalprice, (select maxprice from t1 where ckey = custkey) maxprice, (select minprice from t1 where ckey=custkey) minprice from orders) select custkey from t2 where (totalprice between minprice and maxprice) group by custkey;
                                                                                                                                                           >
----------------------------------------------------------------------------------------------------------------------------------------------------------->
 Fragment 0 [SINGLE]                                                                                                                                       >
     Output layout: [custkey]                                                                                                                              >
     Output partitioning: SINGLE []                                                                                                                        >
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                         >
     - Output[PlanNodeId 74][custkey] => [custkey:bigint]                                                                                                  >
         - RemoteSource[1] => [custkey:bigint]                                                                                                             >
                                                                                                                                                           >
 Fragment 1 [HASH]                                                                                                                                         >
     Output layout: [custkey]                                                                                                                              >
     Output partitioning: SINGLE []                                                                                                                        >
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                         >
     - Aggregate(STREAMING)[custkey][PlanNodeId 69] => [custkey:bigint]                                                                                    >
         - FilterProject[PlanNodeId 912,693][filterPredicate = (SWITCH(is_distinct, WHEN(BOOLEAN'true', BOOLEAN'true'), CAST(fail(INTEGER'28', VARCHAR'Scal>
             - MarkDistinct[PlanNodeId 691][distinct=unique:bigint marker=is_distinct] => [custkey:bigint, totalprice:double, max:double, unique:bigint, mi>
                 - LeftJoin[PlanNodeId 702][("custkey" = "custkey_34")][$hashvalue_108, $hashvalue_112] => [custkey:bigint, totalprice:double, max:double, >
                         Distribution: PARTITIONED                                                                                                         >
                     - AssignUniqueId[PlanNodeId 690] => [custkey:bigint, totalprice:double, max:double, $hashvalue_108:bigint, unique:bigint]             >
                         - FilterProject[PlanNodeId 909,711][filterPredicate = SWITCH(is_distinct_97, WHEN(BOOLEAN'true', BOOLEAN'true'), CAST(fail(INTEGER>
                                 $hashvalue_108 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(custkey), BIGINT'0')) (1:294)                      >
                             - Project[PlanNodeId 1763][projectLocality = LOCAL] => [custkey:bigint, totalprice:double, unique_96:bigint, max:double, is_di>
                                 - MarkDistinct[PlanNodeId 709][distinct=custkey:bigint, totalprice:double, unique_96:bigint marker=is_distinct_97][$hashva>
                                     - LocalExchange[PlanNodeId 1619][HASH][$hashvalue] (custkey) => [custkey:bigint, totalprice:double, unique_96:bigint, >
                                         - Project[PlanNodeId 1762][projectLocality = LOCAL] => [custkey:bigint, totalprice:double, $hashvalue_101:bigint, >
                                                 $hashvalue_107 := combine_hash(combine_hash(combine_hash(BIGINT'0', COALESCE($operator$hash_code(custkey),>
                                             - LeftJoin[PlanNodeId 720][("custkey" = "custkey_1")][$hashvalue_101, $hashvalue_106] => [custkey:bigint, tota>
                                                     Distribution: PARTITIONED                                                                             >
                                                 - AssignUniqueId[PlanNodeId 708] => [custkey:bigint, totalprice:double, $hashvalue_101:bigint, unique_96:b>
                                                         Estimates: {source: CostBasedSourceInfo, rows: 15,000 (131.84kB), cpu: 1,215,000.00, memory: 0.00,>
                                                     - RemoteSource[2] => [custkey:bigint, totalprice:double, $hashvalue_101:bigint]                       >
                                                 - Project[PlanNodeId 1761][projectLocality = LOCAL] => [custkey_1:bigint, max:double, $hashvalue_106:bigin>
                                                         $hashvalue_106 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(custkey_1), BIGINT'0')) (1:>
                                                     - Aggregate(FINAL)[custkey_1][PlanNodeId 4] => [custkey_1:bigint, max:double]                         >
                                                             max := "presto.default.max"((max_98)) (1:47)                                                  >
                                                         - LocalExchange[PlanNodeId 1650][HASH][$hashvalue_103] (custkey_1) => [custkey_1:bigint, max_98:do>
                                                             - RemoteSource[3] => [custkey_1:bigint, max_98:double, $hashvalue_104:bigint]                 >
                     - Project[PlanNodeId 1765][projectLocality = LOCAL] => [custkey_34:bigint, min_51:double, $hashvalue_112:bigint]                      >
                             $hashvalue_112 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(custkey_34), BIGINT'0')) (1:100)                       >
                         - Aggregate(FINAL)[custkey_34][PlanNodeId 34] => [custkey_34:bigint, min_51:double]                                               >
                                 min_51 := "presto.default.min"((min_99)) (1:73)                                                                           >
                             - LocalExchange[PlanNodeId 1666][HASH][$hashvalue_109] (custkey_34) => [custkey_34:bigint, min_99:double, $hashvalue_109:bigin>
                                 - RemoteSource[4] => [custkey_34:bigint, min_99:double, $hashvalue_110:bigint]                                            >
                                                                                                                                                           >
 Fragment 2 [SOURCE]                                                                                                                                       >
     Output layout: [custkey, totalprice, $hashvalue_102]                                                                                                  >
     Output partitioning: HASH [custkey][$hashvalue_102]                                                                                                   >
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                         >
     - ScanProject[PlanNodeId 0,1759][table = TableHandle {connectorId='hive', connectorHandle='HiveTableHandle{schemaName=tpch, tableName=orders, analyzeP>
             Estimates: {source: CostBasedSourceInfo, rows: 15,000 (395.51kB), cpu: 270,000.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, >
             $hashvalue_102 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(custkey), BIGINT'0')) (1:294)                                          >
             LAYOUT: tpch.orders{}                                                                                                                         >
             custkey := custkey:bigint:1:REGULAR (1:294)                                                                                                   >
             totalprice := totalprice:double:3:REGULAR (1:294)                                                                                             >
                                                                                                                                                           >
 Fragment 3 [SOURCE]                                                                                                                                       >
     Output layout: [custkey_1, max_98, $hashvalue_105]                                                                                                    >
     Output partitioning: HASH [custkey_1][$hashvalue_105]                                                                                                 >
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                         >
     - Project[PlanNodeId 1760][projectLocality = LOCAL] => [custkey_1:bigint, max_98:double, $hashvalue_105:bigint]                                       >
             $hashvalue_105 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(custkey_1), BIGINT'0')) (1:133)                                        >
         - Aggregate(PARTIAL)[custkey_1][PlanNodeId 1654] => [custkey_1:bigint, max_98:double]                                                             >
                 max_98 := "presto.default.max"((totalprice_3)) (1:47)                                                                                     >
             - TableScan[PlanNodeId 1][TableHandle {connectorId='hive', connectorHandle='HiveTableHandle{schemaName=tpch, tableName=orders, analyzePartitio>
                     Estimates: {source: CostBasedSourceInfo, rows: 15,000 (395.51kB), cpu: 270,000.00, memory: 0.00, network: 0.00}                       >
                     LAYOUT: tpch.orders{}                                                                                                                 >
                     custkey_1 := custkey:bigint:1:REGULAR (1:117)                                                                                         >
                     totalprice_3 := totalprice:double:3:REGULAR (1:117)                                                                                   >
                                                                                                                                                           >
 Fragment 4 [SOURCE]                                                                                                                                       >
     Output layout: [custkey_34, min_99, $hashvalue_111]                                                                                                   >
     Output partitioning: HASH [custkey_34][$hashvalue_111]                                                                                                >
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                         >
     - Project[PlanNodeId 1764][projectLocality = LOCAL] => [custkey_34:bigint, min_99:double, $hashvalue_111:bigint]                                      >
             $hashvalue_111 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(custkey_34), BIGINT'0')) (1:133)                                       >
         - Aggregate(PARTIAL)[custkey_34][PlanNodeId 1670] => [custkey_34:bigint, min_99:double]                                                           >
                 min_99 := "presto.default.min"((totalprice_36)) (1:73)                                                                                    >
             - TableScan[PlanNodeId 31][TableHandle {connectorId='hive', connectorHandle='HiveTableHandle{schemaName=tpch, tableName=orders, analyzePartiti>
                     Estimates: {source: CostBasedSourceInfo, rows: 15,000 (395.51kB), cpu: 270,000.00, memory: 0.00, network: 0.00}                       >
                     LAYOUT: tpch.orders{}                                                                                                                 >
                     totalprice_36 := totalprice:double:3:REGULAR (1:117)                                                                                  >
                     custkey_34 := custkey:bigint:1:REGULAR (1:117)                                                                                        >
                                                                                                                                                           >
                                                                                                                                                           >
(1 row)

After fix with this PR

presto:tpch> explain (type distributed) with t1 as (select max(totalprice) maxprice, min(totalprice) minprice, custkey ckey from orders group by custkey), t2 as (select custkey, totalprice, (select maxprice from t1 where ckey = custkey) maxprice, (select minprice from t1 where ckey=custkey) minprice from orders) select custkey from t2 where (totalprice between minprice and maxprice) group by custkey;
                                                                                                                                                           >
----------------------------------------------------------------------------------------------------------------------------------------------------------->
 Fragment 0 [SINGLE]                                                                                                                                       >
     Output layout: [custkey]                                                                                                                              >
     Output partitioning: SINGLE []                                                                                                                        >
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                         >
     - Output[PlanNodeId 74][custkey] => [custkey:bigint]                                                                                                  >
         - RemoteSource[1] => [custkey:bigint]                                                                                                             >
                                                                                                                                                           >
 Fragment 1 [HASH]                                                                                                                                         >
     Output layout: [custkey]                                                                                                                              >
     Output partitioning: SINGLE []                                                                                                                        >
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                         >
     - Aggregate[custkey][PlanNodeId 69] => [custkey:bigint]                                                                                               >
         - FilterProject[PlanNodeId 912,693][filterPredicate = (SWITCH(is_distinct, WHEN(BOOLEAN'true', BOOLEAN'true'), CAST(fail(INTEGER'28', VARCHAR'Scal>
             - Project[PlanNodeId 1767][projectLocality = LOCAL] => [custkey:bigint, totalprice:double, max:double, unique:bigint, min_51:double, is_distin>
                 - MarkDistinct[PlanNodeId 691][distinct=custkey:bigint, totalprice:double, max:double, unique:bigint marker=is_distinct][$hashvalue_113] =>
                     - Project[PlanNodeId 1766][projectLocality = LOCAL] => [custkey:bigint, totalprice:double, max:double, unique:bigint, min_51:double, $>
                             $hashvalue_113 := combine_hash(combine_hash(combine_hash(combine_hash(BIGINT'0', COALESCE($operator$hash_code(custkey), BIGINT>
                         - LeftJoin[PlanNodeId 702][("custkey" = "custkey_34")][$hashvalue_108, $hashvalue_112] => [custkey:bigint, totalprice:double, max:>
                                 Distribution: PARTITIONED                                                                                                 >
                             - AssignUniqueId[PlanNodeId 690] => [custkey:bigint, totalprice:double, max:double, $hashvalue_108:bigint, unique:bigint]     >
                                 - FilterProject[PlanNodeId 909,711][filterPredicate = SWITCH(is_distinct_97, WHEN(BOOLEAN'true', BOOLEAN'true'), CAST(fail>
                                         $hashvalue_108 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(custkey), BIGINT'0')) (1:294)              >
                                     - Project[PlanNodeId 1763][projectLocality = LOCAL] => [custkey:bigint, totalprice:double, unique_96:bigint, max:doubl>
                                         - MarkDistinct[PlanNodeId 709][distinct=custkey:bigint, totalprice:double, unique_96:bigint marker=is_distinct_97]>
                                             - LocalExchange[PlanNodeId 1619][HASH][$hashvalue] (custkey) => [custkey:bigint, totalprice:double, unique_96:>
                                                 - Project[PlanNodeId 1762][projectLocality = LOCAL] => [custkey:bigint, totalprice:double, $hashvalue_101:>
                                                         $hashvalue_107 := combine_hash(combine_hash(combine_hash(BIGINT'0', COALESCE($operator$hash_code(c>
                                                     - LeftJoin[PlanNodeId 720][("custkey" = "custkey_1")][$hashvalue_101, $hashvalue_106] => [custkey:bigi>
                                                             Distribution: PARTITIONED                                                                     >
                                                         - AssignUniqueId[PlanNodeId 708] => [custkey:bigint, totalprice:double, $hashvalue_101:bigint, uni>
                                                                 Estimates: {source: CostBasedSourceInfo, rows: 15,000 (131.84kB), cpu: 1,215,000.00, memor>
                                                             - RemoteSource[2] => [custkey:bigint, totalprice:double, $hashvalue_101:bigint]               >
                                                         - Project[PlanNodeId 1761][projectLocality = LOCAL] => [custkey_1:bigint, max:double, $hashvalue_1>
                                                                 $hashvalue_106 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(custkey_1), BIGINT'>
                                                             - Aggregate(FINAL)[custkey_1][PlanNodeId 4] => [custkey_1:bigint, max:double]                 >
                                                                     max := "presto.default.max"((max_98)) (1:47)                                          >
                                                                 - LocalExchange[PlanNodeId 1650][HASH][$hashvalue_103] (custkey_1) => [custkey_1:bigint, m>
                                                                     - RemoteSource[3] => [custkey_1:bigint, max_98:double, $hashvalue_104:bigint]         >
                             - Project[PlanNodeId 1765][projectLocality = LOCAL] => [custkey_34:bigint, min_51:double, $hashvalue_112:bigint]              >
                                     $hashvalue_112 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(custkey_34), BIGINT'0')) (1:100)               >
                                 - Aggregate(FINAL)[custkey_34][PlanNodeId 34] => [custkey_34:bigint, min_51:double]                                       >
                                         min_51 := "presto.default.min"((min_99)) (1:73)                                                                   >
                                     - LocalExchange[PlanNodeId 1666][HASH][$hashvalue_109] (custkey_34) => [custkey_34:bigint, min_99:double, $hashvalue_1>
                                         - RemoteSource[4] => [custkey_34:bigint, min_99:double, $hashvalue_110:bigint]                                    >
                                                                                                                                                           >
 Fragment 2 [SOURCE]                                                                                                                                       >
     Output layout: [custkey, totalprice, $hashvalue_102]                                                                                                  >
     Output partitioning: HASH [custkey][$hashvalue_102]                                                                                                   >
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                         >
     - ScanProject[PlanNodeId 0,1759][table = TableHandle {connectorId='hive', connectorHandle='HiveTableHandle{schemaName=tpch, tableName=orders, analyzeP>
             Estimates: {source: CostBasedSourceInfo, rows: 15,000 (395.51kB), cpu: 270,000.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, >
             $hashvalue_102 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(custkey), BIGINT'0')) (1:294)                                          >
             LAYOUT: tpch.orders{}                                                                                                                         >
             custkey := custkey:bigint:1:REGULAR (1:294)                                                                                                   >
             totalprice := totalprice:double:3:REGULAR (1:294)                                                                                             >
                                                                                                                                                           >
 Fragment 3 [SOURCE]                                                                                                                                       >
     Output layout: [custkey_1, max_98, $hashvalue_105]                                                                                                    >
     Output partitioning: HASH [custkey_1][$hashvalue_105]                                                                                                 >
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                         >
     - Project[PlanNodeId 1760][projectLocality = LOCAL] => [custkey_1:bigint, max_98:double, $hashvalue_105:bigint]                                       >
             $hashvalue_105 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(custkey_1), BIGINT'0')) (1:133)                                        >
         - Aggregate(PARTIAL)[custkey_1][PlanNodeId 1654] => [custkey_1:bigint, max_98:double]                                                             >
                 max_98 := "presto.default.max"((totalprice_3)) (1:47)                                                                                     >
             - TableScan[PlanNodeId 1][TableHandle {connectorId='hive', connectorHandle='HiveTableHandle{schemaName=tpch, tableName=orders, analyzePartitio>
                     Estimates: {source: CostBasedSourceInfo, rows: 15,000 (395.51kB), cpu: 270,000.00, memory: 0.00, network: 0.00}                       >
                     LAYOUT: tpch.orders{}                                                                                                                 >
                     custkey_1 := custkey:bigint:1:REGULAR (1:117)                                                                                         >
                     totalprice_3 := totalprice:double:3:REGULAR (1:117)                                                                                   >
                                                                                                                                                           >
 Fragment 4 [SOURCE]                                                                                                                                       >
     Output layout: [custkey_34, min_99, $hashvalue_111]                                                                                                   >
     Output partitioning: HASH [custkey_34][$hashvalue_111]                                                                                                >
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                         >
     - Project[PlanNodeId 1764][projectLocality = LOCAL] => [custkey_34:bigint, min_99:double, $hashvalue_111:bigint]                                      >
             $hashvalue_111 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(custkey_34), BIGINT'0')) (1:133)                                       >
         - Aggregate(PARTIAL)[custkey_34][PlanNodeId 1670] => [custkey_34:bigint, min_99:double]                                                           >
                 min_99 := "presto.default.min"((totalprice_36)) (1:73)                                                                                    >
             - TableScan[PlanNodeId 31][TableHandle {connectorId='hive', connectorHandle='HiveTableHandle{schemaName=tpch, tableName=orders, analyzePartiti>
                     Estimates: {source: CostBasedSourceInfo, rows: 15,000 (395.51kB), cpu: 270,000.00, memory: 0.00, network: 0.00}                       >
                     LAYOUT: tpch.orders{}                                                                                                                 >
                     totalprice_36 := totalprice:double:3:REGULAR (1:117)                                                                                  >
                     custkey_34 := custkey:bigint:1:REGULAR (1:117)                                                                                        >
                                                                                                                                                           >
                                                                                                                                                           >
(1 row)

After debugging, it's due to the fact that when spill is enabled, the order sensitive local properties will be removed. However, the local properties has hierarchy and cannot be simply removed. In this example, the properties are G(unique), C(custkey), C(totalprice) where unique is the unique key generated by the AssignUniqueId operator. Current logic will simply remove the G(unique) property and keeping the rest of the two. This is not right as the rest will not hold true after the first is removed.
This PR fix it by discarding the rest of properties if any previous properties are removed.

Motivation and Context

Bug fix

Impact

Fix potential incorrect plan

Test Plan

Add unit tests

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Fix bug in local property calculation when spill is enabled :pr:`23922`

@feilong-liu feilong-liu marked this pull request as draft October 31, 2024 00:47
@@ -323,7 +322,16 @@ public ActualProperties build()
{
List<LocalProperty<VariableReferenceExpression>> localProperties = this.localProperties;
if (unordered) {
localProperties = filteredCopy(this.localProperties, property -> !property.isOrderSensitive());
ImmutableList.Builder<LocalProperty<VariableReferenceExpression>> newLocalProperties = ImmutableList.builder();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Native side doesn't need any change on the query shape if the spill is enabled. So can we fix in that way? I assume the query shape without spill enabled is correct for native but only when join spill session property is enabled? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To use the same plan, join spill algorithm needs to preserver order i.e. output is the same as when no spill.

@@ -323,7 +322,16 @@ public ActualProperties build()
{
List<LocalProperty<VariableReferenceExpression>> localProperties = this.localProperties;
if (unordered) {
localProperties = filteredCopy(this.localProperties, property -> !property.isOrderSensitive());
ImmutableList.Builder<LocalProperty<VariableReferenceExpression>> newLocalProperties = ImmutableList.builder();
for (LocalProperty<VariableReferenceExpression> property : this.localProperties) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know G(unique) is the first in this case? and are we able to guarantee that?

Copy link
Contributor Author

@feilong-liu feilong-liu Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we do not need to guarantee this, as which one is earlier depends on other code which produce this property (in this case it's the property generation code for AssignUniqueId) and not related to this part. This code only needs to make sure that properties after removal is still valid

Comment on lines +327 to +332
if (!property.isOrderSensitive()) {
newLocalProperties.add(property);
}
else {
break;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what these props mean.
Here we are discarding all properties after and including the 1st order sensitive one.
Is this the intent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the intent. It also takes me some time to understand this part of logic, the property is hierarchical and if one is invalid, all following will also be invalid

@feilong-liu feilong-liu marked this pull request as ready for review October 31, 2024 17:36
@feilong-liu feilong-liu requested a review from elharo as a code owner October 31, 2024 17:36
Copy link
Contributor

@rschlussel rschlussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a unit test also? Maybe to TestLocalProperties

Also, for anyone else who is confused, want to share a more detailed explanation that I got from conversation with @feilong-liu:

Without spill, we have the properties G(unique), C(custkey), C(totalprice) where unique is the unique key generated by the AssignUniqueId operator. That means that we are grouped by the unique key, and within each group, custkey and totalprice are constant. (if two rows have the same value for unique, they will have the same value for custkey and total price).

when spill is enabled, we assume that the order of rows is not maintained, so we mark it as unordered (https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/sql/planner/optimizations/PropertyDerivations.java#L394). When this field is set, we were removing any order sensitive properties from the list. In this case G(unique) is considered order sensitive, so we were removing that. However, if you remove the G(unique) property, but keep the C(custkey) and C(totalprice) properties, the planner will think that custkey and totalprice are always constant, which is not true. And that's what resulted in the incorrect plan.

Instead, you need to remove all properties following the order sensitive one because the properties are hierarchical, so the following properties are defined within the context of the previous ones.

@feilong-liu feilong-liu merged commit 9e0c295 into prestodb:master Nov 1, 2024
56 checks passed
@feilong-liu feilong-liu deleted the fix_local_exchange branch November 1, 2024 16:16
@jaystarshot jaystarshot mentioned this pull request Nov 1, 2024
25 tasks
@tdcmeehan tdcmeehan added the from:Meta PR from Meta label Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:Meta PR from Meta
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants