Skip to content

Conversation

@zhztheplayer
Copy link
Member

@zhztheplayer zhztheplayer commented Jun 16, 2025

https://issues.apache.org/jira/browse/SPARK-52484

What changes were proposed in this pull request?

The PR removes the unnecessary assertion in ColumnarToRowExec introduced by #25264 to guarantee some flexibilities for 3rd Spark plugins. Especially in Apache Gluten, the assertion blocks some of our effort in query optimization because we needed an intermediate state of the query plan which Spark may see as illegal.

Moreover, some typical reasons this intermediate state is needed in Gluten are:

  1. Gluten has a cost evaluator API to evaluate the cost of a transition rule (which adds a unary node on top of an input plan). In the case Gluten will need a fake leaf to let the rule apply on it for cost evaluation. This leaf node has to be made a columnar one to bypass this assertion, which is a bit hacky.
  2. Gluten has a cascades-style query optimizer (RAS) which could set a leaf, dummy, row-based plan node to hide up a child-tree of a brach query plan node, during which this leaf is to represent a so-called cascades 'group'. Although this pattern (C2R on a row-based plan) is illegal, it could still be used as the input of an optimizer rule to potentially be matched on and then to be converted into a valid query plan.

This PR is to remove the assertion to ensure some flexibilities to the 3rd plugins. This should be no harm for the upstream Apache Spark, because the query execution will still be failed by this error without this assertion on an illegal query plan.

Some workarounds used by Gluten for bypassing this assertion:

  1. https://github.com/apache/incubator-gluten/blob/0a1b5c28678653242ab0fd7b28ebba1dca43ccb1/gluten-core/src/main/scala/org/apache/gluten/extension/columnar/transition/package.scala#L83
  2. https://github.com/apache/incubator-gluten/blob/0a1b5c28678653242ab0fd7b28ebba1dca43ccb1/gluten-core/src/main/scala/org/apache/gluten/extension/columnar/enumerated/planner/plan/GlutenPlanModel.scala#L51-L55

Once the assertion is removed, Gluten will be able to remove these workarounds to simply code.

Does this PR introduce any user-facing change?

Basically no. An assertion error in plan-building time will be replaced by an exception in execution time (still from the driver side) when an illegal query plan is generated.

How was this patch tested?

Existing UTs.

@github-actions github-actions bot added the SQL label Jun 16, 2025
@zhztheplayer zhztheplayer changed the title [SPARK-52484] Remove child.supportsColumnar assertion in ColumnarToRo… [SPARK-52484][SQL] Remove child.supportsColumnar assertion in ColumnarToRo… Jun 16, 2025
@zhztheplayer
Copy link
Member Author

@cloud-fan @revans2

Would you mind taking a look? Thanks!

@zhztheplayer zhztheplayer changed the title [SPARK-52484][SQL] Remove child.supportsColumnar assertion in ColumnarToRo… [SPARK-52484][SQL] Remove child.supportsColumnar assertion in ColumnarToRowExec Jun 16, 2025
@zhztheplayer zhztheplayer changed the title [SPARK-52484][SQL] Remove child.supportsColumnar assertion in ColumnarToRowExec [SPARK-52484][SQL] Skip child.supportsColumnar assertion from driver side in ColumnarToRowExec Jun 16, 2025
@dongjoon-hyun dongjoon-hyun dismissed their stale review June 16, 2025 23:31

Addressed.

Copy link
Member

@yaooqinn yaooqinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This removal looks reasonable to me since we have a much more well-defined error than this vague assert message.

@yaooqinn yaooqinn closed this in 722d02c Jun 18, 2025
@yaooqinn
Copy link
Member

Merged to master, thank you @zhztheplayer @dongjoon-hyun

@zhztheplayer
Copy link
Member Author

Thanks everyone for reviewing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants