Commit c7c2bda
[SPARK-30065][SQL][2.4] DataFrameNaFunctions.drop should handle duplicate columns
(Backport of #26700)
### What changes were proposed in this pull request?
`DataFrameNaFunctions.drop` doesn't handle duplicate columns even when column names are not specified.
```Scala
val left = Seq(("1", null), ("3", "4")).toDF("col1", "col2")
val right = Seq(("1", "2"), ("3", null)).toDF("col1", "col2")
val df = left.join(right, Seq("col1"))
df.printSchema
df.na.drop("any").show
```
produces
```
root
|-- col1: string (nullable = true)
|-- col2: string (nullable = true)
|-- col2: string (nullable = true)
org.apache.spark.sql.AnalysisException: Reference 'col2' is ambiguous, could be: col2, col2.;
at org.apache.spark.sql.catalyst.expressions.package$AttributeSeq.resolve(package.scala:240)
```
The reason for the above failure is that columns are resolved by name and if there are multiple columns with the same name, it will fail due to ambiguity.
This PR updates `DataFrameNaFunctions.drop` such that if the columns to drop are not specified, it will resolve ambiguity gracefully by applying `drop` to all the eligible columns. (Note that if the user specifies the columns, it will still continue to fail due to ambiguity).
### Why are the changes needed?
If column names are not specified, `drop` should not fail due to ambiguity since it should still be able to apply `drop` to the eligible columns.
### Does this PR introduce any user-facing change?
Yes, now all the rows with nulls are dropped in the above example:
```
scala> df.na.drop("any").show
+----+----+----+
|col1|col2|col2|
+----+----+----+
+----+----+----+
```
### How was this patch tested?
Added new unit tests.
Closes #27411 from imback82/backport-SPARK-30065.
Authored-by: Terry Kim <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>1 parent 4c3c1d6 commit c7c2bda
File tree
2 files changed
+43
-14
lines changed- sql/core/src
- main/scala/org/apache/spark/sql
- test/scala/org/apache/spark/sql
2 files changed
+43
-14
lines changedLines changed: 21 additions & 11 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
44 | | - | |
| 44 | + | |
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
| |||
51 | 51 | | |
52 | 52 | | |
53 | 53 | | |
54 | | - | |
| 54 | + | |
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
| |||
90 | 90 | | |
91 | 91 | | |
92 | 92 | | |
93 | | - | |
94 | | - | |
95 | | - | |
96 | | - | |
97 | | - | |
| 93 | + | |
98 | 94 | | |
99 | 95 | | |
100 | 96 | | |
| |||
120 | 116 | | |
121 | 117 | | |
122 | 118 | | |
123 | | - | |
124 | | - | |
125 | | - | |
126 | | - | |
| 119 | + | |
127 | 120 | | |
128 | 121 | | |
129 | 122 | | |
| |||
488 | 481 | | |
489 | 482 | | |
490 | 483 | | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
491 | 501 | | |
492 | 502 | | |
493 | 503 | | |
| |||
Lines changed: 22 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
240 | 240 | | |
241 | 241 | | |
242 | 242 | | |
243 | | - | |
| 243 | + | |
244 | 244 | | |
245 | 245 | | |
246 | 246 | | |
| 247 | + | |
247 | 248 | | |
248 | 249 | | |
249 | | - | |
| 250 | + | |
250 | 251 | | |
251 | 252 | | |
252 | 253 | | |
| |||
263 | 264 | | |
264 | 265 | | |
265 | 266 | | |
266 | | - | |
| 267 | + | |
267 | 268 | | |
| 269 | + | |
268 | 270 | | |
269 | 271 | | |
270 | 272 | | |
| |||
394 | 396 | | |
395 | 397 | | |
396 | 398 | | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
397 | 416 | | |
0 commit comments