Skip to content

Conversation

@ulysses-you
Copy link
Contributor

@ulysses-you ulysses-you commented Jan 11, 2022

What changes were proposed in this pull request?

Skip alias the ExtractValue whose children contains NamedLambdaVariable.

Why are the changes needed?

Since #32773, the NamedLambdaVariable can produce the references, however it cause the rule NestedColumnAliasing alias the ExtractValue which contains NamedLambdaVariable. It fails since we can not match a NamedLambdaVariable to an actual attribute.

Talk more:
During NamedLambdaVariable#replaceWithAliases, it uses the references of nestedField to match the output attributes of grandchildren. However NamedLambdaVariable is created at analyzer as a virtual attribute, and it is not resolved from the output of children. So we can not get any attribute when use the references of NamedLambdaVariable to match the grandchildren's output.

Does this PR introduce any user-facing change?

yes, bug fix

How was this patch tested?

Add new test

@github-actions github-actions bot added the SQL label Jan 11, 2022
@HyukjinKwon
Copy link
Member

cc @viirya FYI

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable. Could you also mention when it fails to match the attribute in the description? Thanks.

@ulysses-you
Copy link
Contributor Author

@viirya has updated the description, hope it is clear now

@viirya
Copy link
Member

viirya commented Jan 12, 2022

Thanks. As #32773 was also merged to 3.1, is this also an issue on branch-3.1 too? @ulysses-you

@ulysses-you
Copy link
Contributor Author

I think land this to branch-3.2 is enough, since the backport of branch-3.1 is revered.
see https://github.com/apache/spark/blob/branch-3.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala

@viirya
Copy link
Member

viirya commented Jan 12, 2022

Okay, thanks! Merging to master.

@viirya viirya closed this in 189b205 Jan 12, 2022
@viirya
Copy link
Member

viirya commented Jan 12, 2022

Oh, there is a conflict. @ulysses-you Can you submit a backport PR to branch-3.2? Thanks.

@ulysses-you
Copy link
Contributor Author

thank you @viirya created #35175

@ulysses-you ulysses-you deleted the SPARK-37855 branch January 12, 2022 05:30
viirya pushed a commit that referenced this pull request Jan 12, 2022
…ray inside a nested struct

This is a backport of #35170 for branch-3.2.

### What changes were proposed in this pull request?

Skip alias the `ExtractValue` whose children contains `NamedLambdaVariable`.

### Why are the changes needed?

Since #32773, the `NamedLambdaVariable` can produce the references, however it cause the rule `NestedColumnAliasing` alias the `ExtractValue` which contains `NamedLambdaVariable`. It fails since we can not match a `NamedLambdaVariable` to an actual attribute.

Talk more:
During `NamedLambdaVariable#replaceWithAliases`, it uses the references of nestedField to match the output attributes of grandchildren. However `NamedLambdaVariable` is created at analyzer as a virtual attribute, and it is not resolved from the output of children. So we can not get any attribute when use the references of `NamedLambdaVariable` to match the grandchildren's output.

### Does this PR introduce _any_ user-facing change?

yes, bug fix

### How was this patch tested?

Add new test

Closes #35175 from ulysses-you/SPARK-37855-branch-3.2.

Authored-by: ulysses-you <[email protected]>
Signed-off-by: Liang-Chi Hsieh <[email protected]>
dchvn pushed a commit to dchvn/spark that referenced this pull request Jan 19, 2022
…nside a nested struct

### What changes were proposed in this pull request?

Skip alias the `ExtractValue` whose children contains `NamedLambdaVariable`.

### Why are the changes needed?

Since apache#32773, the `NamedLambdaVariable` can produce the references, however it cause the rule `NestedColumnAliasing` alias the `ExtractValue` which contains `NamedLambdaVariable`. It fails since we can not match a `NamedLambdaVariable` to an actual attribute.

Talk more:
During `NamedLambdaVariable#replaceWithAliases`, it uses the references of nestedField to match the output attributes of grandchildren. However `NamedLambdaVariable` is created at analyzer as a virtual attribute, and it is not resolved from the output of children. So we can not get any attribute when use the references of `NamedLambdaVariable` to match the grandchildren's output.

### Does this PR introduce _any_ user-facing change?

yes, bug fix

### How was this patch tested?

Add new test

Closes apache#35170 from ulysses-you/SPARK-37855.

Authored-by: ulysses-you <[email protected]>
Signed-off-by: Liang-Chi Hsieh <[email protected]>
catalinii pushed a commit to lyft/spark that referenced this pull request Feb 22, 2022
…ray inside a nested struct

This is a backport of apache#35170 for branch-3.2.

### What changes were proposed in this pull request?

Skip alias the `ExtractValue` whose children contains `NamedLambdaVariable`.

### Why are the changes needed?

Since apache#32773, the `NamedLambdaVariable` can produce the references, however it cause the rule `NestedColumnAliasing` alias the `ExtractValue` which contains `NamedLambdaVariable`. It fails since we can not match a `NamedLambdaVariable` to an actual attribute.

Talk more:
During `NamedLambdaVariable#replaceWithAliases`, it uses the references of nestedField to match the output attributes of grandchildren. However `NamedLambdaVariable` is created at analyzer as a virtual attribute, and it is not resolved from the output of children. So we can not get any attribute when use the references of `NamedLambdaVariable` to match the grandchildren's output.

### Does this PR introduce _any_ user-facing change?

yes, bug fix

### How was this patch tested?

Add new test

Closes apache#35175 from ulysses-you/SPARK-37855-branch-3.2.

Authored-by: ulysses-you <[email protected]>
Signed-off-by: Liang-Chi Hsieh <[email protected]>
catalinii pushed a commit to lyft/spark that referenced this pull request Mar 4, 2022
…ray inside a nested struct

This is a backport of apache#35170 for branch-3.2.

### What changes were proposed in this pull request?

Skip alias the `ExtractValue` whose children contains `NamedLambdaVariable`.

### Why are the changes needed?

Since apache#32773, the `NamedLambdaVariable` can produce the references, however it cause the rule `NestedColumnAliasing` alias the `ExtractValue` which contains `NamedLambdaVariable`. It fails since we can not match a `NamedLambdaVariable` to an actual attribute.

Talk more:
During `NamedLambdaVariable#replaceWithAliases`, it uses the references of nestedField to match the output attributes of grandchildren. However `NamedLambdaVariable` is created at analyzer as a virtual attribute, and it is not resolved from the output of children. So we can not get any attribute when use the references of `NamedLambdaVariable` to match the grandchildren's output.

### Does this PR introduce _any_ user-facing change?

yes, bug fix

### How was this patch tested?

Add new test

Closes apache#35175 from ulysses-you/SPARK-37855-branch-3.2.

Authored-by: ulysses-you <[email protected]>
Signed-off-by: Liang-Chi Hsieh <[email protected]>
kazuyukitanimura pushed a commit to kazuyukitanimura/spark that referenced this pull request Aug 10, 2022
…ray inside a nested struct

This is a backport of apache#35170 for branch-3.2.

### What changes were proposed in this pull request?

Skip alias the `ExtractValue` whose children contains `NamedLambdaVariable`.

### Why are the changes needed?

Since apache#32773, the `NamedLambdaVariable` can produce the references, however it cause the rule `NestedColumnAliasing` alias the `ExtractValue` which contains `NamedLambdaVariable`. It fails since we can not match a `NamedLambdaVariable` to an actual attribute.

Talk more:
During `NamedLambdaVariable#replaceWithAliases`, it uses the references of nestedField to match the output attributes of grandchildren. However `NamedLambdaVariable` is created at analyzer as a virtual attribute, and it is not resolved from the output of children. So we can not get any attribute when use the references of `NamedLambdaVariable` to match the grandchildren's output.

### Does this PR introduce _any_ user-facing change?

yes, bug fix

### How was this patch tested?

Add new test

Closes apache#35175 from ulysses-you/SPARK-37855-branch-3.2.

Authored-by: ulysses-you <[email protected]>
Signed-off-by: Liang-Chi Hsieh <[email protected]>
(cherry picked from commit a58b8a8)
Signed-off-by: Dongjoon Hyun <[email protected]>
eejbyfeldt pushed a commit to eejbyfeldt/spark that referenced this pull request Jun 24, 2024
In apache#35170 SPARK-37855 and apache#32301 SPARK-35194 introduced conditions for
ExtractValues that can currently not be handled. The considtion is
introduced after `collectRootReferenceAndExtractValue` and just removes
these candidates. This is problematic since these expressions might have
contained `AttributeReference` that needed to not do an incorrect
rewrite. This fixes these family of bugs by moving the conditions into
the function `collectRootReferenceAndExtractValue`.
cloud-fan pushed a commit that referenced this pull request Jun 27, 2024
### What changes were proposed in this pull request?

In #35170 SPARK-37855 and #32301 SPARK-35194 introduced conditions for ExtractValues that can currently not be handled. The considtion is introduced after `collectRootReferenceAndExtractValue` and just removes these candidates. This is problematic since these expressions might have contained `AttributeReference` that needed to not do an incorrect aliasing. This fixes this family of bugs by moving the conditions into the function `collectRootReferenceAndExtractValue`.

### Why are the changes needed?

The current code leads to `IllegalStateException` runtime failures.

### Does this PR introduce _any_ user-facing change?

Yes, fixes a bug.

### How was this patch tested?

Existing and new unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46756 from eejbyfeldt/SPARK-48428.

Authored-by: Emil Ejbyfeldt <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan pushed a commit that referenced this pull request Jun 27, 2024
### What changes were proposed in this pull request?

In #35170 SPARK-37855 and #32301 SPARK-35194 introduced conditions for ExtractValues that can currently not be handled. The considtion is introduced after `collectRootReferenceAndExtractValue` and just removes these candidates. This is problematic since these expressions might have contained `AttributeReference` that needed to not do an incorrect aliasing. This fixes this family of bugs by moving the conditions into the function `collectRootReferenceAndExtractValue`.

### Why are the changes needed?

The current code leads to `IllegalStateException` runtime failures.

### Does this PR introduce _any_ user-facing change?

Yes, fixes a bug.

### How was this patch tested?

Existing and new unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46756 from eejbyfeldt/SPARK-48428.

Authored-by: Emil Ejbyfeldt <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit b11608c)
Signed-off-by: Wenchen Fan <[email protected]>
turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
### What changes were proposed in this pull request?

In apache#35170 SPARK-37855 and apache#32301 SPARK-35194 introduced conditions for ExtractValues that can currently not be handled. The considtion is introduced after `collectRootReferenceAndExtractValue` and just removes these candidates. This is problematic since these expressions might have contained `AttributeReference` that needed to not do an incorrect aliasing. This fixes this family of bugs by moving the conditions into the function `collectRootReferenceAndExtractValue`.

### Why are the changes needed?

The current code leads to `IllegalStateException` runtime failures.

### Does this PR introduce _any_ user-facing change?

Yes, fixes a bug.

### How was this patch tested?

Existing and new unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#46756 from eejbyfeldt/SPARK-48428.

Authored-by: Emil Ejbyfeldt <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit b11608c)
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants