[SPARK-28461][SQL] Pad Decimal numbers with trailing zeros to the scale of the column #26697

wangyum · 2019-11-28T00:20:09Z

What changes were proposed in this pull request?

HIVE-12063 improved pad decimal numbers with trailing zeros to the scale of the column. The following description is copied from the description of HIVE-12063.

HIVE-7373 was to address the problems of trimming tailing zeros by Hive, which caused many problems including treating 0.0, 0.00 and so on as 0, which has different precision/scale. Please refer to HIVE-7373 description. However, HIVE-7373 was reverted by HIVE-8745 while the underlying problems remained. HIVE-11835 was resolved recently to address one of the problems, where 0.0, 0.00, and so on cannot be read into decimal(1,1).
However, HIVE-11835 didn't address the problem of showing as 0 in query result for any decimal values such as 0.0, 0.00, etc. This causes confusion as 0 and 0.0 have different precision/scale than 0.
The proposal here is to pad zeros for query result to the type's scale. This not only removes the confusion described above, but also aligns with many other DBs. Internal decimal number representation doesn't change, however.

Spark SQL:

// bin/spark-sql
spark-sql> select cast(1 as decimal(38, 18));
1
spark-sql>

// bin/beeline
0: jdbc:hive2://localhost:10000/default> select cast(1 as decimal(38, 18));
+----------------------------+--+
| CAST(1 AS DECIMAL(38,18))  |
+----------------------------+--+
| 1.000000000000000000       |
+----------------------------+--+

// bin/spark-shell
scala> spark.sql("select cast(1 as decimal(38, 18))").show(false)
+-------------------------+
|CAST(1 AS DECIMAL(38,18))|
+-------------------------+
|1.000000000000000000     |
+-------------------------+

// bin/pyspark
>>> spark.sql("select cast(1 as decimal(38, 18))").show()
+-------------------------+
|CAST(1 AS DECIMAL(38,18))|
+-------------------------+
|     1.000000000000000000|
+-------------------------+

// bin/sparkR
> showDF(sql("SELECT cast(1 as decimal(38, 18))"))
+-------------------------+
|CAST(1 AS DECIMAL(38,18))|
+-------------------------+
|     1.000000000000000000|
+-------------------------+

PostgreSQL:

postgres=# select cast(1 as decimal(38, 18));
       numeric
----------------------
 1.000000000000000000
(1 row)

Presto:

presto> select cast(1 as decimal(38, 18));
        _col0
----------------------
 1.000000000000000000
(1 row)

How was this patch tested?

unit tests and manual test:

spark-sql> select cast(1 as decimal(38, 18));
1.000000000000000000

Spark SQL Upgrading Guide:

wangyum · 2019-11-28T01:18:24Z

retest this please

dongjoon-hyun · 2019-11-28T02:27:43Z

Thank you, @wangyum !

HyukjinKwon · 2019-11-28T02:50:10Z

Thanks @dongjoon-hyun and @wangyum

HyukjinKwon

Looks good if tests pass

SparkQA · 2019-11-28T04:16:53Z

Test build #114555 has finished for PR 26697 at commit 23bef9c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-11-28T04:58:45Z

Test build #114557 has finished for PR 26697 at commit 23bef9c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2019-11-28T05:19:32Z

-Phadoop-3.2 -Phive-2.3 seems to fail with a different reason first.

[info] Building Spark using SBT with these arguments:  -Phadoop-3.2 -Phive-2.3 -Pspark-ganglia-lgpl -Pyarn -Pkubernetes -Pmesos -Phadoop-cloud -Phive -Phive-thriftserver -Pkinesis-asl test:package streaming-kinesis-asl-assembly/assembly

org.mockito.exceptions.base.MockitoException: 
ClassCastException occurred while creating the mockito mock :  
class to mock : 'javax.servlet.http.HttpServletRequest',
loaded by classloader : 'sun.misc.Launcher$AppClassLoader@490d6c15'   created class : 
'org.mockito.codegen.HttpServletRequest$MockitoMock$254323811', loaded by classloader : 'net.bytebuddy.dynamic.loading.MultipleParentClassLoader@6aed7392'   proxy instance class : 'org.mockito.codegen.HttpServletRequest$MockitoMock$254323811', loaded by classloader : 'net.bytebuddy.dynamic.loading.MultipleParentClassLoader@6aed7392'   instance creation by : ObjenesisInstantiator  You might experience classloading issues, please ask the mockito mailing-list.

I saw the exact failure in another PR, too. Let's re-trigger after midnight (PST).

wangyum · 2019-11-28T08:14:58Z

retest this please

SparkQA · 2019-11-28T09:45:11Z

Test build #114566 has finished for PR 26697 at commit 23bef9c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

wangyum · 2019-11-28T09:58:04Z

@shahidki31 Could you fix this issue: https://issues.apache.org/jira/browse/SPARK-30068?

shahidki31 · 2019-11-28T10:11:18Z

Thanks @wangyum. I will look into it. Not sure why it didn't fail earlier?

wangyum · 2019-11-28T13:05:09Z

@shahidki31 You can reproduce this issue by Github action: https://github.com/spark-thriftserver/spark/pull/1/checks?check_run_id=324625735

shahidki31 · 2019-11-28T13:11:11Z

@wangyum I think I couldn't yet reproduce in my local. I will check again. It seems some issue with mocking HttpServletRequest. But there are tests (like AllExecutionPageSuite.scala) which does the same thing seems passing?

spark/sql/core/src/test/scala/org/apache/spark/sql/execution/ui/AllExecutionsPageSuite.scala

Line 54 in dde0d2f

val request = mock(classOf[HttpServletRequest])

wangyum · 2019-11-28T13:19:32Z

Yes. I also couldn't reproduce it on my local machine, but I can reproduce it by Github actions.

shahidki31 · 2019-11-28T13:29:13Z

@wangyum I am not sure, why it is suddenly started failing. If it is a blocker, then we can revert the test for time being, I'll raise it again once found the root cause?

wangyum · 2019-11-29T02:23:57Z

retest this please

HyukjinKwon · 2019-11-29T02:36:10Z

Can we hold on for a moment? I am investigating the test failure at #26706.

HyukjinKwon · 2019-11-29T04:24:30Z

retest this please

SparkQA · 2019-11-29T04:45:14Z

Test build #114594 has finished for PR 26697 at commit 23bef9c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-11-29T06:53:55Z

Test build #114598 has finished for PR 26697 at commit 23bef9c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-11-29T08:13:46Z

^ this test failure will be fixed together at #26710

HyukjinKwon · 2019-11-30T06:40:13Z

retest this please

SparkQA · 2019-11-30T08:05:02Z

Test build #114649 has finished for PR 26697 at commit 23bef9c.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

wangyum · 2019-11-30T08:31:16Z

retest this please

SparkQA · 2019-11-30T11:43:26Z

Test build #114657 has finished for PR 26697 at commit 23bef9c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

wangyum · 2019-11-30T12:04:57Z

retest this please

HyukjinKwon · 2019-11-30T15:03:53Z

Should really be fixed now...

SparkQA · 2019-11-30T16:35:57Z

Test build #114668 has finished for PR 26697 at commit 23bef9c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-12-01T09:55:44Z

retest this please

SparkQA · 2019-12-01T14:02:09Z

Test build #114680 has finished for PR 26697 at commit 23bef9c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-12-02T00:02:25Z

Merged to master.

…le of the column ## What changes were proposed in this pull request? [HIVE-12063](https://issues.apache.org/jira/browse/HIVE-12063) improved pad decimal numbers with trailing zeros to the scale of the column. The following description is copied from the description of HIVE-12063. > HIVE-7373 was to address the problems of trimming tailing zeros by Hive, which caused many problems including treating 0.0, 0.00 and so on as 0, which has different precision/scale. Please refer to HIVE-7373 description. However, HIVE-7373 was reverted by HIVE-8745 while the underlying problems remained. HIVE-11835 was resolved recently to address one of the problems, where 0.0, 0.00, and so on cannot be read into decimal(1,1). However, HIVE-11835 didn't address the problem of showing as 0 in query result for any decimal values such as 0.0, 0.00, etc. This causes confusion as 0 and 0.0 have different precision/scale than 0. The proposal here is to pad zeros for query result to the type's scale. This not only removes the confusion described above, but also aligns with many other DBs. Internal decimal number representation doesn't change, however. **Spark SQL**: ```sql // bin/spark-sql spark-sql> select cast(1 as decimal(38, 18)); 1 spark-sql> // bin/beeline 0: jdbc:hive2://localhost:10000/default> select cast(1 as decimal(38, 18)); +----------------------------+--+ | CAST(1 AS DECIMAL(38,18)) | +----------------------------+--+ | 1.000000000000000000 | +----------------------------+--+ // bin/spark-shell scala> spark.sql("select cast(1 as decimal(38, 18))").show(false) +-------------------------+ |CAST(1 AS DECIMAL(38,18))| +-------------------------+ |1.000000000000000000 | +-------------------------+ // bin/pyspark >>> spark.sql("select cast(1 as decimal(38, 18))").show() +-------------------------+ |CAST(1 AS DECIMAL(38,18))| +-------------------------+ | 1.000000000000000000| +-------------------------+ // bin/sparkR > showDF(sql("SELECT cast(1 as decimal(38, 18))")) +-------------------------+ |CAST(1 AS DECIMAL(38,18))| +-------------------------+ | 1.000000000000000000| +-------------------------+ ``` **PostgreSQL**: ```sql postgres=# select cast(1 as decimal(38, 18)); numeric ---------------------- 1.000000000000000000 (1 row) ``` **Presto**: ```sql presto> select cast(1 as decimal(38, 18)); _col0 ---------------------- 1.000000000000000000 (1 row) ``` ## How was this patch tested? unit tests and manual test: ```sql spark-sql> select cast(1 as decimal(38, 18)); 1.000000000000000000 ``` Spark SQL Upgrading Guide: ![image](https://user-images.githubusercontent.com/5399861/69649620-4405c380-10a8-11ea-84b1-6ee675663b98.png) Closes apache#26697 from wangyum/SPARK-28461. Authored-by: Yuming Wang <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>

…zeros (apache#243) * [HADP-45102] Add config not to pad decimal with trailing zeros for compatibility with 2.3.1 (apache#101) ### What changes were proposed in this pull request? Add config `spark.sql.legacy.decimal.padTrailingZeros` such that padding decimal with trailing zeros can be disabled for spark-sql interface ### Why are the changes needed? In 2.3.1 decimals are not padded with trailing zeros which is changed in apache#26697. This is to keep compatibility with 2.3.1. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing UT updated. Co-authored-by: tianlzhang <[email protected]>

fix

23bef9c

wangyum changed the title ~~[SPARK-28461][SQL] Pad Decimal numbers with trailing zeros to the scale of the column~~ [SPARK-28461][SQL][test-hadoop3.2] Pad Decimal numbers with trailing zeros to the scale of the column Nov 28, 2019

dongjoon-hyun added the SQL label Nov 28, 2019

HyukjinKwon approved these changes Nov 28, 2019

View reviewed changes

HyukjinKwon changed the title ~~[SPARK-28461][SQL][test-hadoop3.2] Pad Decimal numbers with trailing zeros to the scale of the column~~ [SPARK-28461][SQL] Pad Decimal numbers with trailing zeros to the scale of the column Dec 1, 2019

HyukjinKwon closed this in 708ab57 Dec 2, 2019

[SPARK-28461][SQL] Pad Decimal numbers with trailing zeros to the scale of the column #26697

[SPARK-28461][SQL] Pad Decimal numbers with trailing zeros to the scale of the column #26697

Uh oh!

Conversation

wangyum commented Nov 28, 2019

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

wangyum commented Nov 28, 2019

Uh oh!

dongjoon-hyun commented Nov 28, 2019

Uh oh!

HyukjinKwon commented Nov 28, 2019

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 28, 2019

Uh oh!

SparkQA commented Nov 28, 2019

Uh oh!

dongjoon-hyun commented Nov 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wangyum commented Nov 28, 2019

Uh oh!

SparkQA commented Nov 28, 2019

Uh oh!

wangyum commented Nov 28, 2019

Uh oh!

shahidki31 commented Nov 28, 2019

Uh oh!

wangyum commented Nov 28, 2019

Uh oh!

shahidki31 commented Nov 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wangyum commented Nov 28, 2019

Uh oh!

shahidki31 commented Nov 28, 2019

Uh oh!

wangyum commented Nov 29, 2019

Uh oh!

HyukjinKwon commented Nov 29, 2019

Uh oh!

HyukjinKwon commented Nov 29, 2019

Uh oh!

SparkQA commented Nov 29, 2019

Uh oh!

SparkQA commented Nov 29, 2019

Uh oh!

HyukjinKwon commented Nov 29, 2019

Uh oh!

HyukjinKwon commented Nov 30, 2019

Uh oh!

SparkQA commented Nov 30, 2019

Uh oh!

wangyum commented Nov 30, 2019

Uh oh!

SparkQA commented Nov 30, 2019

Uh oh!

wangyum commented Nov 30, 2019

Uh oh!

HyukjinKwon commented Nov 30, 2019

Uh oh!

SparkQA commented Nov 30, 2019

Uh oh!

HyukjinKwon commented Dec 1, 2019

Uh oh!

SparkQA commented Dec 1, 2019

Uh oh!

HyukjinKwon commented Dec 2, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

dongjoon-hyun commented Nov 28, 2019 •

edited

Loading

shahidki31 commented Nov 28, 2019 •

edited

Loading