-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-36073][SQL] EquivalentExpressions fixes and improvements #33281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-36073][SQL] EquivalentExpressions fixes and improvements #33281
Conversation
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #140857 has started for PR 33281 at commit |
|
Test build #140852 has finished for PR 33281 at commit
|
|
Kubernetes integration test starting |
|
cc @cloud-fan, @Kimahriman, @maropu, @viirya |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the behavior of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test actually proves that the new updateExprTree(stats.expr, localEquivalenceMap, -stats.useCount) approach fixes a bug as well.
Before this PR equivalence.getAllExprStates(1) didn't return anything because the notChild filter at https://github.com/apache/spark/pull/33281/files#diff-4d8c210a38fc808fef3e5c966b438591f225daa3c9fd69359446b94c351aa11eL90-L92 filtered out all child expressions (including add) of the common expression ifExpr1. But it should filter out only children "defined by" childrenToRecurse() and commonChildrenToRecurse(). In this PR when we remove ifExpr1 from localEquivalenceMap we keep add in localEquivalenceMap. Then we add add to map (in the 2nd iteration of the loop) and so add will have useCount = 2 in the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does transparently canonicalized mean PromotePrecision(add).canonicalized == add.canonicalized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, maybe I could rephrase this if it doesn't make sense.
Could you explain more about the three points listed in the description? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I got your idea here. For performance, is there significant difference? The current approach is simpler. The filtering in localEquivalenceMap is based on height so it should be very fast. Is this still a performance bottleneck?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is that this change complicates the computation of useCount. It will be harder to debug in the future. Before we are not certain that this is a real performance bottleneck, this may look like a premature optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I'm not sure how significant the difference is. I think if we have deep expressions in localEquivalenceMap then filtering by height (before this PR) might not help a lot.
The new code in this PR might look a bit complicated at first, but actually it is very simple, we just remove expressions from the localEquivalenceMap with the reverse of addExprTree().
This new approach also fixes a bug tested here: https://github.com/apache/spark/pull/33281/files#r667343014
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I'd rather consider it as an improvement as it doesn't cause query failure or codegen failure, though it fails to identify a common subexpression.
That's said, we don't need to hurry on this for 3.2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right, I've changed ticket type to improvement.
Sure. I've updated the description and added some examples. |
|
Gentle ping @viirya, @cloud-fan. |
|
@viirya, @cloud-fan I wonder if we can move forward with this? |
|
Test build #144726 has finished for PR 33281 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's follow SPARK-33539 and add a new method in QueryExecutionErrors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @allisonwang-db. Added it in 227cad1
7786c0c to
227cad1
Compare
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #145034 has finished for PR 33281 at commit
|
| map -= wrapper | ||
| false | ||
| } else { | ||
| // Should not happen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this should not happen, I think throwing IllegalStateException is better as it's a bug. QueryExecutionErrors is for user-facing errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throwing IllegalStateException looks reasonable to me. Fixed in c7c7016
| map.put(wrapper, ExpressionStats(expr)(useCount)) | ||
| } else { | ||
| // Should not happen | ||
| throw QueryExecutionErrors.updateEquivalentExpressionsError(expr, map, useCount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in c7c7016
7c4d506 to
c7c7016
Compare
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #145069 has finished for PR 33281 at commit
|
|
thanks, merging to master! |
|
Thanks for the review @cloud-fan! |
What changes were proposed in this pull request?
This PR:
addwill be common subexpression in the following example:PromotePrecision) are considered common subexpressions.After this PR,
transparentwill not be common subexpression in the following example:Why are the changes needed?
Bugfix + performance improvement.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing + new UTs.