-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-28411][PYTHON][SQL] InsertInto with overwrite is not honored #25175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #107763 has finished for PR 25175 at commit
|
python/pyspark/sql/readwriter.py
Outdated
| """ | ||
| self._jwrite.mode("overwrite" if overwrite else "append").insertInto(tableName) | ||
| if (overwrite): | ||
| self._jwrite.mode("overwrite") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call self.mode("overwrite") instead?
python/pyspark/sql/readwriter.py
Outdated
| Optionally overwriting any existing data. | ||
| """ | ||
| self._jwrite.mode("overwrite" if overwrite else "append").insertInto(tableName) | ||
| if (overwrite): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (overwrite) -> if overwrite
|
|
||
| def test_insert_into(self): | ||
| df = self.spark.createDataFrame([("a", 1), ("b", 2)], ["C1", "C2"]) | ||
| df.write.saveAsTable("test_table") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use with self.table("test_table"): too?
| df.write.insertInto("test_table", False) | ||
| self.assertEqual(4, self.spark.sql("select * from test_table").count()) | ||
|
|
||
| # self.spark.sql("drop table test_table") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems a mistake.
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good otherwise
|
Test build #107771 has finished for PR 25175 at commit
|
|
Test build #107774 has finished for PR 25175 at commit
|
|
I'm wondering what if a user explicitly specifies df.write.mode("overwrite").insertInto("table", overwrite=False)It might be better that the default value for cc @HyukjinKwon |
|
Yup, that's fine with me too. |
ueshin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, pending tests.
|
Test build #107793 has finished for PR 25175 at commit
|
|
Merged to master. |
## What changes were proposed in this pull request?
In the following python code
```
df.write.mode("overwrite").insertInto("table")
```
```insertInto``` ignores ```mode("overwrite")``` and appends by default.
## How was this patch tested?
Add Unit test.
Closes apache#25175 from huaxingao/spark-28411.
Authored-by: Huaxin Gao <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
What changes were proposed in this pull request?
In the following python code
insertIntoignoresmode("overwrite")and appends by default.How was this patch tested?
Add Unit test.