Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Apr 28, 2020

What changes were proposed in this pull request?

This reverts commit 5631a96.

Closes #28328

Why are the changes needed?

The PR #25754 introduced a bug in isInCollection. For example, if the SQL config spark.sql.optimizer.inSetConversionThresholdis set to 10 (by default):

val set = (0 to 20).map(_.toString).toSet
val data = Seq("1").toDF("x")
data.select($"x".isInCollection(set).as("isInCollection")).show()

The function must return 'true' because "1" is in the set of "0" ... "20" but it returns "false":

+--------------+
|isInCollection|
+--------------+
|         false|
+--------------+

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

$ ./build/sbt "test:testOnly *ColumnExpressionSuite"

…with a large size collection"

This reverts commit 5631a96.
@MaxGekk
Copy link
Member Author

MaxGekk commented Apr 28, 2020

@HyukjinKwon @cloud-fan @gatorsmile Please, take a look at the PR.

@MaxGekk
Copy link
Member Author

MaxGekk commented Apr 28, 2020

also cc @WeichenXu123 @dongjoon-hyun @maropu

@SparkQA
Copy link

SparkQA commented Apr 28, 2020

Test build #121969 has finished for PR 28388 at commit 6ca93fa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.0!

@cloud-fan cloud-fan closed this in b7cabc8 Apr 28, 2020
cloud-fan pushed a commit that referenced this pull request Apr 28, 2020
…n.isInCollection() with a large size collection"

### What changes were proposed in this pull request?
This reverts commit 5631a96.

Closes #28328

### Why are the changes needed?
The PR  #25754 introduced a bug in `isInCollection`. For example, if the SQL config `spark.sql.optimizer.inSetConversionThreshold`is set to 10 (by default):
```scala
val set = (0 to 20).map(_.toString).toSet
val data = Seq("1").toDF("x")
data.select($"x".isInCollection(set).as("isInCollection")).show()
```
The function must return **'true'** because "1" is in the set of "0" ... "20" but it returns "false":
```
+--------------+
|isInCollection|
+--------------+
|         false|
+--------------+
```

### Does this PR introduce any user-facing change?
Yes

### How was this patch tested?
```
$ ./build/sbt "test:testOnly *ColumnExpressionSuite"
```

Closes #28388 from MaxGekk/fix-isInCollection-revert.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit b7cabc8)
Signed-off-by: Wenchen Fan <[email protected]>
@dongjoon-hyun
Copy link
Member

+1, late LGTM. Thank you for reverting.

cloud-fan pushed a commit that referenced this pull request Apr 30, 2020
…f `isInCollection`

### What changes were proposed in this pull request?
- Add tests for different element types of collections that could be passed to `isInCollection`. Added tests for types that can pass the check `In`.`checkInputDataTypes()`.
- Test different switch thresholds in the `isInCollection: Scala Collection` test.

### Why are the changes needed?
To prevent regressions like introduced by #25754 and reverted by #28388

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
By existing and new tests in `ColumnExpressionSuite`

Closes #28405 from MaxGekk/test-isInCollection.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan pushed a commit that referenced this pull request Apr 30, 2020
…f `isInCollection`

### What changes were proposed in this pull request?
- Add tests for different element types of collections that could be passed to `isInCollection`. Added tests for types that can pass the check `In`.`checkInputDataTypes()`.
- Test different switch thresholds in the `isInCollection: Scala Collection` test.

### Why are the changes needed?
To prevent regressions like introduced by #25754 and reverted by #28388

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
By existing and new tests in `ColumnExpressionSuite`

Closes #28405 from MaxGekk/test-isInCollection.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 9164865)
Signed-off-by: Wenchen Fan <[email protected]>
@MaxGekk MaxGekk deleted the fix-isInCollection-revert branch June 5, 2020 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants