Skip to content

Conversation

@liuzqt
Copy link
Contributor

@liuzqt liuzqt commented Feb 7, 2024

What changes were proposed in this pull request?

#43435 and #43760 are fixing a correctness issue which will be triggered when AQE applied on cached query plan, specifically, when AQE coalescing the final result stage of the cached plan.

The current semantic of spark.sql.optimizer.canChangeCachedPlanOutputPartitioning

(source code):

when true, we enable AQE, but disable coalescing final stage (default)
when false, we disable AQE

But let’s revisit the semantic of this config: actually for caller the only thing that matters is whether we change the output partitioning of the cached plan. And we should only try to apply AQE if possible. Thus we want to modify the semantic of spark.sql.optimizer.canChangeCachedPlanOutputPartitioning

when true, we enable AQE and allow coalescing final: this might lead to perf regression, because it introduce extra shuffle
when false, we enable AQE, but disable coalescing final stage. (this is actually the true semantic of old behavior)
Also, to keep the default behavior unchanged, we might want to flip the default value of spark.sql.optimizer.canChangeCachedPlanOutputPartitioning to false

Why are the changes needed?

To allow AQE coalesce final stage in SQL cached plan. Also make the semantic of spark.sql.optimizer.canChangeCachedPlanOutputPartitioning more reasonable.

Does this PR introduce any user-facing change?

How was this patch tested?

Updated UTs.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Feb 7, 2024
@liuzqt
Copy link
Contributor Author

liuzqt commented Feb 7, 2024

@cloud-fan @maryannxue Please help review this change, thanks!

@cloud-fan
Copy link
Contributor

cc @yaooqinn

@github-actions github-actions bot added the PYTHON label Feb 7, 2024
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in becc04a Feb 7, 2024
@cloud-fan
Copy link
Contributor

@liuzqt can you correct the JIRA ticket ID? It seems wrong.

@liuzqt liuzqt changed the title [SPARK-46996][SQL] Allow AQE coalesce final stage in SQL cached plan [SPARK-46995][SQL] Allow AQE coalesce final stage in SQL cached plan Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants