Skip to content

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented Nov 28, 2019

What changes were proposed in this pull request?

HIVE-12063 improved pad decimal numbers with trailing zeros to the scale of the column. The following description is copied from the description of HIVE-12063.

HIVE-7373 was to address the problems of trimming tailing zeros by Hive, which caused many problems including treating 0.0, 0.00 and so on as 0, which has different precision/scale. Please refer to HIVE-7373 description. However, HIVE-7373 was reverted by HIVE-8745 while the underlying problems remained. HIVE-11835 was resolved recently to address one of the problems, where 0.0, 0.00, and so on cannot be read into decimal(1,1).
However, HIVE-11835 didn't address the problem of showing as 0 in query result for any decimal values such as 0.0, 0.00, etc. This causes confusion as 0 and 0.0 have different precision/scale than 0.
The proposal here is to pad zeros for query result to the type's scale. This not only removes the confusion described above, but also aligns with many other DBs. Internal decimal number representation doesn't change, however.

Spark SQL:

// bin/spark-sql
spark-sql> select cast(1 as decimal(38, 18));
1
spark-sql>

// bin/beeline
0: jdbc:hive2://localhost:10000/default> select cast(1 as decimal(38, 18));
+----------------------------+--+
| CAST(1 AS DECIMAL(38,18))  |
+----------------------------+--+
| 1.000000000000000000       |
+----------------------------+--+

// bin/spark-shell
scala> spark.sql("select cast(1 as decimal(38, 18))").show(false)
+-------------------------+
|CAST(1 AS DECIMAL(38,18))|
+-------------------------+
|1.000000000000000000     |
+-------------------------+

// bin/pyspark
>>> spark.sql("select cast(1 as decimal(38, 18))").show()
+-------------------------+
|CAST(1 AS DECIMAL(38,18))|
+-------------------------+
|     1.000000000000000000|
+-------------------------+

// bin/sparkR
> showDF(sql("SELECT cast(1 as decimal(38, 18))"))
+-------------------------+
|CAST(1 AS DECIMAL(38,18))|
+-------------------------+
|     1.000000000000000000|
+-------------------------+

PostgreSQL:

postgres=# select cast(1 as decimal(38, 18));
       numeric
----------------------
 1.000000000000000000
(1 row)

Presto:

presto> select cast(1 as decimal(38, 18));
        _col0
----------------------
 1.000000000000000000
(1 row)

How was this patch tested?

unit tests and manual test:

spark-sql> select cast(1 as decimal(38, 18));
1.000000000000000000

Spark SQL Upgrading Guide:
image

@wangyum wangyum changed the title [SPARK-28461][SQL] Pad Decimal numbers with trailing zeros to the scale of the column [SPARK-28461][SQL][test-hadoop3.2] Pad Decimal numbers with trailing zeros to the scale of the column Nov 28, 2019
@wangyum
Copy link
Member Author

wangyum commented Nov 28, 2019

retest this please

@dongjoon-hyun
Copy link
Member

Thank you, @wangyum !

@HyukjinKwon
Copy link
Member

Thanks @dongjoon-hyun and @wangyum

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good if tests pass

@SparkQA
Copy link

SparkQA commented Nov 28, 2019

Test build #114555 has finished for PR 26697 at commit 23bef9c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 28, 2019

Test build #114557 has finished for PR 26697 at commit 23bef9c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Nov 28, 2019

-Phadoop-3.2 -Phive-2.3 seems to fail with a different reason first.

[info] Building Spark using SBT with these arguments:  -Phadoop-3.2 -Phive-2.3 -Pspark-ganglia-lgpl -Pyarn -Pkubernetes -Pmesos -Phadoop-cloud -Phive -Phive-thriftserver -Pkinesis-asl test:package streaming-kinesis-asl-assembly/assembly
org.mockito.exceptions.base.MockitoException: 
ClassCastException occurred while creating the mockito mock :  
class to mock : 'javax.servlet.http.HttpServletRequest',
loaded by classloader : 'sun.misc.Launcher$AppClassLoader@490d6c15'   created class : 
'org.mockito.codegen.HttpServletRequest$MockitoMock$254323811', loaded by classloader : 'net.bytebuddy.dynamic.loading.MultipleParentClassLoader@6aed7392'   proxy instance class : 'org.mockito.codegen.HttpServletRequest$MockitoMock$254323811', loaded by classloader : 'net.bytebuddy.dynamic.loading.MultipleParentClassLoader@6aed7392'   instance creation by : ObjenesisInstantiator  You might experience classloading issues, please ask the mockito mailing-list. 

I saw the exact failure in another PR, too. Let's re-trigger after midnight (PST).

@wangyum
Copy link
Member Author

wangyum commented Nov 28, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Nov 28, 2019

Test build #114566 has finished for PR 26697 at commit 23bef9c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Nov 28, 2019

@shahidki31 Could you fix this issue: https://issues.apache.org/jira/browse/SPARK-30068?

@shahidki31
Copy link
Contributor

Thanks @wangyum. I will look into it. Not sure why it didn't fail earlier?

@wangyum
Copy link
Member Author

wangyum commented Nov 28, 2019

@shahidki31 You can reproduce this issue by Github action: https://github.com/spark-thriftserver/spark/pull/1/checks?check_run_id=324625735

@shahidki31
Copy link
Contributor

shahidki31 commented Nov 28, 2019

@wangyum I think I couldn't yet reproduce in my local. I will check again. It seems some issue with mocking HttpServletRequest. But there are tests (like AllExecutionPageSuite.scala) which does the same thing seems passing?

@wangyum
Copy link
Member Author

wangyum commented Nov 28, 2019

Yes. I also couldn't reproduce it on my local machine, but I can reproduce it by Github actions.

@shahidki31
Copy link
Contributor

@wangyum I am not sure, why it is suddenly started failing. If it is a blocker, then we can revert the test for time being, I'll raise it again once found the root cause?

@wangyum
Copy link
Member Author

wangyum commented Nov 29, 2019

retest this please

@HyukjinKwon
Copy link
Member

Can we hold on for a moment? I am investigating the test failure at #26706.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Nov 29, 2019

Test build #114594 has finished for PR 26697 at commit 23bef9c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 29, 2019

Test build #114598 has finished for PR 26697 at commit 23bef9c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

^ this test failure will be fixed together at #26710

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Nov 30, 2019

Test build #114649 has finished for PR 26697 at commit 23bef9c.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Nov 30, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Nov 30, 2019

Test build #114657 has finished for PR 26697 at commit 23bef9c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Nov 30, 2019

retest this please

@HyukjinKwon
Copy link
Member

Should really be fixed now...

@SparkQA
Copy link

SparkQA commented Nov 30, 2019

Test build #114668 has finished for PR 26697 at commit 23bef9c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon HyukjinKwon changed the title [SPARK-28461][SQL][test-hadoop3.2] Pad Decimal numbers with trailing zeros to the scale of the column [SPARK-28461][SQL] Pad Decimal numbers with trailing zeros to the scale of the column Dec 1, 2019
@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Dec 1, 2019

Test build #114680 has finished for PR 26697 at commit 23bef9c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

attilapiros pushed a commit to attilapiros/spark that referenced this pull request Dec 6, 2019
…le of the column

## What changes were proposed in this pull request?

[HIVE-12063](https://issues.apache.org/jira/browse/HIVE-12063) improved pad decimal numbers with trailing zeros to the scale of the column. The following description is copied from the description of HIVE-12063.

> HIVE-7373 was to address the problems of trimming tailing zeros by Hive, which caused many problems including treating 0.0, 0.00 and so on as 0, which has different precision/scale. Please refer to HIVE-7373 description. However, HIVE-7373 was reverted by HIVE-8745 while the underlying problems remained. HIVE-11835 was resolved recently to address one of the problems, where 0.0, 0.00, and so on cannot be read into decimal(1,1).
 However, HIVE-11835 didn't address the problem of showing as 0 in query result for any decimal values such as 0.0, 0.00, etc. This causes confusion as 0 and 0.0 have different precision/scale than 0.
The proposal here is to pad zeros for query result to the type's scale. This not only removes the confusion described above, but also aligns with many other DBs. Internal decimal number representation doesn't change, however.

**Spark SQL**:
```sql
// bin/spark-sql
spark-sql> select cast(1 as decimal(38, 18));
1
spark-sql>

// bin/beeline
0: jdbc:hive2://localhost:10000/default> select cast(1 as decimal(38, 18));
+----------------------------+--+
| CAST(1 AS DECIMAL(38,18))  |
+----------------------------+--+
| 1.000000000000000000       |
+----------------------------+--+

// bin/spark-shell
scala> spark.sql("select cast(1 as decimal(38, 18))").show(false)
+-------------------------+
|CAST(1 AS DECIMAL(38,18))|
+-------------------------+
|1.000000000000000000     |
+-------------------------+

// bin/pyspark
>>> spark.sql("select cast(1 as decimal(38, 18))").show()
+-------------------------+
|CAST(1 AS DECIMAL(38,18))|
+-------------------------+
|     1.000000000000000000|
+-------------------------+

// bin/sparkR
> showDF(sql("SELECT cast(1 as decimal(38, 18))"))
+-------------------------+
|CAST(1 AS DECIMAL(38,18))|
+-------------------------+
|     1.000000000000000000|
+-------------------------+
```

**PostgreSQL**:
```sql
postgres=# select cast(1 as decimal(38, 18));
       numeric
----------------------
 1.000000000000000000
(1 row)
```
**Presto**:
```sql
presto> select cast(1 as decimal(38, 18));
        _col0
----------------------
 1.000000000000000000
(1 row)
```

## How was this patch tested?

unit tests and manual test:
```sql
spark-sql> select cast(1 as decimal(38, 18));
1.000000000000000000
```
Spark SQL Upgrading Guide:
![image](https://user-images.githubusercontent.com/5399861/69649620-4405c380-10a8-11ea-84b1-6ee675663b98.png)

Closes apache#26697 from wangyum/SPARK-28461.

Authored-by: Yuming Wang <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
…zeros (apache#243)

* [HADP-45102] Add config not to pad decimal with trailing zeros for compatibility with 2.3.1 (apache#101)

### What changes were proposed in this pull request?
Add config `spark.sql.legacy.decimal.padTrailingZeros` such that padding decimal with trailing zeros can be disabled for spark-sql interface

### Why are the changes needed?
In 2.3.1 decimals are not padded with trailing zeros which is changed in apache#26697. This is to keep compatibility with 2.3.1.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing UT updated.

Co-authored-by: tianlzhang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants