-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-34905][SQL][TESTS] Enable ANSI intervals in SQLQueryTestSuite/ThriftServerQueryTestSuite
#32099
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #137094 has finished for PR 32099 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #137101 has finished for PR 32099 at commit
|
### What changes were proposed in this pull request? 1. Extend `IntervalUtils` methods: `toYearMonthIntervalString` and `toDayTimeIntervalString` to support formatting of year-month/day-time intervals in Hive style. The methods get new parameter style which can have to values; `HIVE_STYLE` and `ANSI_STYLE`. 2. Invoke `toYearMonthIntervalString` and `toDayTimeIntervalString` from the `Cast` expression with the `style` parameter is set to `ANSI_STYLE`. 3. Invoke `toYearMonthIntervalString` and `toDayTimeIntervalString` from `HiveResult` with `style` is set to `HIVE_STYLE`. ### Why are the changes needed? The `spark-sql` shell formats its output in Hive style by using `HiveResult.hiveResultString()`. The changes are needed to match Hive behavior. For instance, Hive: ```sql 0: jdbc:hive2://localhost:10000/default> select timestamp'2021-01-01 01:02:03.000001' - date'2020-12-31'; +-----------------------+ | _c0 | +-----------------------+ | 1 01:02:03.000001000 | +-----------------------+ ``` Spark before the changes: ```sql spark-sql> select timestamp'2021-01-01 01:02:03.000001' - date'2020-12-31'; INTERVAL '1 01:02:03.000001' DAY TO SECOND ``` Also this should unblock #32099 which enables *.sql tests in `SQLQueryTestSuite`. ### Does this PR introduce _any_ user-facing change? Yes. After the changes: ```sql spark-sql> select timestamp'2021-01-01 01:02:03.000001' - date'2020-12-31'; 1 01:02:03.000001000 ``` ### How was this patch tested? 1. Added new tests to `IntervalUtilsSuite`: ``` $ build/sbt "test:testOnly *IntervalUtilsSuite" ``` 2. Modified existing tests in `HiveResultSuite`: ``` $ build/sbt -Phive-2.3 -Phive-thriftserver "testOnly *HiveResultSuite" ``` 3. By running cast tests: ``` $ build/sbt "testOnly *CastSuite*" ``` Closes #32120 from MaxGekk/ansi-intervals-hive-thrift-server. Authored-by: Max Gekk <[email protected]> Signed-off-by: Max Gekk <[email protected]>
### What changes were proposed in this pull request?
1. Map Catalyst's interval types to Hive's types:
- YearMonthIntervalType -> `interval_year_month`
- DayTimeIntervalType -> `interval_day_time`
2. Invoke `HiveResult.toHiveString()` to convert external intervals types ` java.time.Period`/`java.time.Duration` to strings.
### Why are the changes needed?
1. To be able to retrieve ANSI intervals via Hive Thrift server.
2. This fixes the issue:
```sql
$ ./sbin/start-thriftserver.sh
$ ./bin/beeline
Beeline version 2.3.8 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000/default "" "" ""
Connecting to jdbc:hive2://localhost:10000/default
Connected to: Spark SQL (version 3.2.0-SNAPSHOT)
0: jdbc:hive2://localhost:10000/default> select timestamp'2021-01-01 01:02:03.000001' - date'2020-12-31';
Error: java.lang.IllegalArgumentException: Unrecognized type name: day-time interval (state=,code=0)
```
3. It should unblock #32099 which enables `*.sql` tests in `ThriftServerQueryTestSuite`.
### Does this PR introduce _any_ user-facing change?
Yes. After the changes:
```sql
0: jdbc:hive2://localhost:10000/default> select timestamp'2021-01-01 01:02:03.000001' - date'2020-12-31';
+----------------------------------------------------+
| subtracttimestamps(TIMESTAMP '2021-01-01 01:02:03.000001', DATE '2020-12-31') |
+----------------------------------------------------+
| 1 01:02:03.000001000 |
+----------------------------------------------------+
1 row selected (1.637 seconds)
```
### How was this patch tested?
By running new test:
```
$ ./build/sbt -Phive -Phive-thriftserver "test:testOnly *SparkThriftServerProtocolVersionsSuite"
$ ./build/sbt -Phive -Phive-thriftserver "test:testOnly *SparkMetadataOperationSuite"
```
Also checked an array of an interval:
```sql
0: jdbc:hive2://localhost:10000/default> select array(timestamp'2021-01-01 01:02:03.000001' - date'2020-12-31');
+----------------------------------------------------+
| array(subtracttimestamps(TIMESTAMP '2021-01-01 01:02:03.000001', DATE '2020-12-31')) |
+----------------------------------------------------+
| [1 01:02:03.000001000] |
+----------------------------------------------------+
```
Closes #32121 from MaxGekk/ansi-intervals-thrift-protocol.
Authored-by: Max Gekk <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
@yaooqinn @AngersZhuuuu @HyukjinKwon @cloud-fan Could you review this PR, please. |
| select date'2020-01-01' - timestamp'2019-10-06 10:11:12.345678' | ||
| -- !query schema | ||
| struct<subtracttimestamps(DATE '2020-01-01', TIMESTAMP '2019-10-06 10:11:12.345678'):interval> | ||
| struct<subtracttimestamps(DATE '2020-01-01', TIMESTAMP '2019-10-06 10:11:12.345678'):day-time interval> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we finalized the SQL name for the new interval types? How about other databases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet. So far, I took the names for the sub-types from the SQL standard, see #31810
Probably, we will need to re-define them when we will implement parsing of interval types from SQL.
|
thanks, merging to master! |
|
Test build #137200 has finished for PR 32099 at commit
|
What changes were proposed in this pull request?
Remove
spark.sql.legacy.interval.enabledsettings fromSQLQueryTestSuite/ThriftServerQueryTestSuitethat enables new ANSI intervals by default.Why are the changes needed?
To use default settings for intervals, and test new ANSI intervals - year-month and day-time interval introduced by SPARK-27793.
Does this PR introduce any user-facing change?
Should not because this affects tests only.
How was this patch tested?
By running the affected tests, for instance: