Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Apr 8, 2021

What changes were proposed in this pull request?

Remove spark.sql.legacy.interval.enabled settings from SQLQueryTestSuite/ThriftServerQueryTestSuite that enables new ANSI intervals by default.

Why are the changes needed?

To use default settings for intervals, and test new ANSI intervals - year-month and day-time interval introduced by SPARK-27793.

Does this PR introduce any user-facing change?

Should not because this affects tests only.

How was this patch tested?

By running the affected tests, for instance:

$ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z datetime.sql"
$ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z date.sql"
$ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z timestamp.sql"
$ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z interval.sql"

@github-actions github-actions bot added the SQL label Apr 8, 2021
@SparkQA
Copy link

SparkQA commented Apr 8, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41672/

@SparkQA
Copy link

SparkQA commented Apr 8, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41672/

@SparkQA
Copy link

SparkQA commented Apr 8, 2021

Test build #137094 has finished for PR 32099 at commit 32a0877.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class UpdateStarAction(condition: Option[Expression]) extends MergeAction
  • case class InsertStarAction(condition: Option[Expression]) extends MergeAction

@SparkQA
Copy link

SparkQA commented Apr 8, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41679/

@SparkQA
Copy link

SparkQA commented Apr 8, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41679/

@SparkQA
Copy link

SparkQA commented Apr 9, 2021

Test build #137101 has finished for PR 32099 at commit eb3cebc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Apr 10, 2021

Waiting for #32120 and #32121

MaxGekk added a commit that referenced this pull request Apr 11, 2021
### What changes were proposed in this pull request?
1. Extend `IntervalUtils` methods: `toYearMonthIntervalString` and `toDayTimeIntervalString` to support formatting of year-month/day-time intervals in Hive style. The methods get new parameter style which can have to values; `HIVE_STYLE` and `ANSI_STYLE`.
2. Invoke `toYearMonthIntervalString` and `toDayTimeIntervalString` from the `Cast` expression with the `style` parameter is set to `ANSI_STYLE`.
3. Invoke `toYearMonthIntervalString` and `toDayTimeIntervalString` from `HiveResult` with `style` is set to `HIVE_STYLE`.

### Why are the changes needed?
The `spark-sql` shell formats its output in Hive style by using `HiveResult.hiveResultString()`. The changes are needed to match Hive behavior. For instance,

Hive:
```sql
0: jdbc:hive2://localhost:10000/default> select timestamp'2021-01-01 01:02:03.000001' - date'2020-12-31';
+-----------------------+
|          _c0          |
+-----------------------+
| 1 01:02:03.000001000  |
+-----------------------+
```

Spark before the changes:
```sql
spark-sql> select timestamp'2021-01-01 01:02:03.000001' - date'2020-12-31';
INTERVAL '1 01:02:03.000001' DAY TO SECOND
```

Also this should unblock #32099 which enables *.sql tests in `SQLQueryTestSuite`.

### Does this PR introduce _any_ user-facing change?
Yes. After the changes:
```sql
spark-sql> select timestamp'2021-01-01 01:02:03.000001' - date'2020-12-31';
1 01:02:03.000001000
```

### How was this patch tested?
1. Added new tests to `IntervalUtilsSuite`:
```
$  build/sbt "test:testOnly *IntervalUtilsSuite"
```
2. Modified existing tests in `HiveResultSuite`:
```
$  build/sbt -Phive-2.3 -Phive-thriftserver "testOnly *HiveResultSuite"
```
3. By running cast tests:
```
$ build/sbt "testOnly *CastSuite*"
```

Closes #32120 from MaxGekk/ansi-intervals-hive-thrift-server.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
HyukjinKwon pushed a commit that referenced this pull request Apr 12, 2021
### What changes were proposed in this pull request?
1. Map Catalyst's interval types to Hive's types:
    - YearMonthIntervalType -> `interval_year_month`
    - DayTimeIntervalType -> `interval_day_time`
2. Invoke `HiveResult.toHiveString()` to convert external intervals types ` java.time.Period`/`java.time.Duration` to strings.

### Why are the changes needed?
1. To be able to retrieve ANSI intervals via Hive Thrift server.
2. This fixes the issue:
```sql
 $ ./sbin/start-thriftserver.sh
 $ ./bin/beeline
Beeline version 2.3.8 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000/default "" "" ""
Connecting to jdbc:hive2://localhost:10000/default
Connected to: Spark SQL (version 3.2.0-SNAPSHOT)
0: jdbc:hive2://localhost:10000/default> select timestamp'2021-01-01 01:02:03.000001' - date'2020-12-31';
Error: java.lang.IllegalArgumentException: Unrecognized type name: day-time interval (state=,code=0)
```
3. It should unblock #32099 which enables `*.sql` tests in `ThriftServerQueryTestSuite`.

### Does this PR introduce _any_ user-facing change?
Yes. After the changes:
```sql
0: jdbc:hive2://localhost:10000/default> select timestamp'2021-01-01 01:02:03.000001' - date'2020-12-31';
+----------------------------------------------------+
| subtracttimestamps(TIMESTAMP '2021-01-01 01:02:03.000001', DATE '2020-12-31') |
+----------------------------------------------------+
| 1 01:02:03.000001000                               |
+----------------------------------------------------+
1 row selected (1.637 seconds)
```

### How was this patch tested?
By running new test:
```
$ ./build/sbt -Phive -Phive-thriftserver "test:testOnly *SparkThriftServerProtocolVersionsSuite"
$ ./build/sbt -Phive -Phive-thriftserver "test:testOnly *SparkMetadataOperationSuite"
```
Also checked an array of an interval:
```sql
0: jdbc:hive2://localhost:10000/default> select array(timestamp'2021-01-01 01:02:03.000001' - date'2020-12-31');
+----------------------------------------------------+
| array(subtracttimestamps(TIMESTAMP '2021-01-01 01:02:03.000001', DATE '2020-12-31')) |
+----------------------------------------------------+
| [1 01:02:03.000001000]                             |
+----------------------------------------------------+
```

Closes #32121 from MaxGekk/ansi-intervals-thrift-protocol.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
@SparkQA
Copy link

SparkQA commented Apr 12, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41779/

@SparkQA
Copy link

SparkQA commented Apr 12, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41779/

@MaxGekk
Copy link
Member Author

MaxGekk commented Apr 12, 2021

@yaooqinn @AngersZhuuuu @HyukjinKwon @cloud-fan Could you review this PR, please.

select date'2020-01-01' - timestamp'2019-10-06 10:11:12.345678'
-- !query schema
struct<subtracttimestamps(DATE '2020-01-01', TIMESTAMP '2019-10-06 10:11:12.345678'):interval>
struct<subtracttimestamps(DATE '2020-01-01', TIMESTAMP '2019-10-06 10:11:12.345678'):day-time interval>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we finalized the SQL name for the new interval types? How about other databases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet. So far, I took the names for the sub-types from the SQL standard, see #31810

Probably, we will need to re-define them when we will implement parsing of interval types from SQL.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 8f8bac6 Apr 12, 2021
@SparkQA
Copy link

SparkQA commented Apr 12, 2021

Test build #137200 has finished for PR 32099 at commit 77248be.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants