Skip to content

Conversation

@huaxingao
Copy link
Contributor

What changes were proposed in this pull request?

In the following python code

df.write.mode("overwrite").insertInto("table") 

insertInto ignores mode("overwrite") and appends by default.

How was this patch tested?

Add Unit test.

@SparkQA
Copy link

SparkQA commented Jul 16, 2019

Test build #107763 has finished for PR 25175 at commit 566fc84.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

"""
self._jwrite.mode("overwrite" if overwrite else "append").insertInto(tableName)
if (overwrite):
self._jwrite.mode("overwrite")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call self.mode("overwrite") instead?

Optionally overwriting any existing data.
"""
self._jwrite.mode("overwrite" if overwrite else "append").insertInto(tableName)
if (overwrite):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (overwrite) -> if overwrite


def test_insert_into(self):
df = self.spark.createDataFrame([("a", 1), ("b", 2)], ["C1", "C2"])
df.write.saveAsTable("test_table")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use with self.table("test_table"): too?

df.write.insertInto("test_table", False)
self.assertEqual(4, self.spark.sql("select * from test_table").count())

# self.spark.sql("drop table test_table")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems a mistake.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good otherwise

@SparkQA
Copy link

SparkQA commented Jul 17, 2019

Test build #107771 has finished for PR 25175 at commit 379eba6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 17, 2019

Test build #107774 has finished for PR 25175 at commit 8f3074f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ueshin
Copy link
Member

ueshin commented Jul 17, 2019

I'm wondering what if a user explicitly specifies overwrite=False?
E.g.,

df.write.mode("overwrite").insertInto("table", overwrite=False)

It might be better that the default value for overwrite argument is None and update the mode if overwrite is not None?

cc @HyukjinKwon

@HyukjinKwon
Copy link
Member

Yup, that's fine with me too.

Copy link
Member

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending tests.

@SparkQA
Copy link

SparkQA commented Jul 17, 2019

Test build #107793 has finished for PR 25175 at commit 9b167e1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

vinodkc pushed a commit to vinodkc/spark that referenced this pull request Jul 18, 2019
## What changes were proposed in this pull request?
In the following python code
```
df.write.mode("overwrite").insertInto("table")
```
```insertInto``` ignores ```mode("overwrite")```  and appends by default.

## How was this patch tested?

Add Unit test.

Closes apache#25175 from huaxingao/spark-28411.

Authored-by: Huaxin Gao <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants