-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-41214][SQL] Fix AQE cache does not update plan and metrics #39037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks @ulysses-you for this fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this test check metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added one more assert with SparkListenerSQLAdaptiveSQLMetricUpdates
fb4be5f to
60ac065
Compare
60ac065 to
96c3302
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AdaptiveSparkPlan nodes are being injected for following use-cases:
1- Parent Query level as root node of SparkPlan,
2- AQE under InMemoryRelation,
3- SubQueries.
Does it makes sense to have UT also including both subQuery + AQE under IMR cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HashAggregateExec nodes metrics were coming as empty before:
https://issues.apache.org/jira/secure/attachment/13052914/DAG%20when%20AQE%3DON%20and%20AQECachedDFSupport%3DON%20without%20fix.png
Does it make sense to verify also HashAggregateExec metric(s) (coming before InMemoryRelation nodes) to support robustness? For example: HashAggregateExec - number of output rows does not change per test run.
|
Hi @ulysses-you and @cloud-fan, |
|
close this in favor of #39624 |
What changes were proposed in this pull request?
This pr fixs two issues when cache enable AQE
We do not propagate SQL metrics if the current execution id is not mapping to current query execution. The AdaptiveSparkPlanExec in InMemoryTableScan does not have its own query execution id. So we missed that SQL metrics.
A simaple case is:
We only update final plan if contains subquery to avoid unnecessary
SparkListenerSQLAdaptiveExecutionUpdateevent, however, the AdaptiveSparkPlanExec missed updating if the final stage incude InMemoryTableScan.A simaple case is:
Why are the changes needed?
Correct the plan and metrics if cache enable AQE. And make Spark UI work.
Does this PR introduce any user-facing change?
yes, after this pr, the Spark UI show the correct plan and metrics with AQE cache.
How was this patch tested?
add test and manually test