Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Apr 10, 2021

What changes were proposed in this pull request?

  1. Map Catalyst's interval types to Hive's types:
    • YearMonthIntervalType -> interval_year_month
    • DayTimeIntervalType -> interval_day_time
  2. Invoke HiveResult.toHiveString() to convert external intervals types java.time.Period/java.time.Duration to strings.

Why are the changes needed?

  1. To be able to retrieve ANSI intervals via Hive Thrift server.
  2. This fixes the issue:
 $ ./sbin/start-thriftserver.sh
 $ ./bin/beeline
Beeline version 2.3.8 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000/default "" "" ""
Connecting to jdbc:hive2://localhost:10000/default
Connected to: Spark SQL (version 3.2.0-SNAPSHOT)
0: jdbc:hive2://localhost:10000/default> select timestamp'2021-01-01 01:02:03.000001' - date'2020-12-31';
Error: java.lang.IllegalArgumentException: Unrecognized type name: day-time interval (state=,code=0)
  1. It should unblock [SPARK-34905][SQL][TESTS] Enable ANSI intervals in SQLQueryTestSuite/ThriftServerQueryTestSuite #32099 which enables *.sql tests in ThriftServerQueryTestSuite.

Does this PR introduce any user-facing change?

Yes. After the changes:

0: jdbc:hive2://localhost:10000/default> select timestamp'2021-01-01 01:02:03.000001' - date'2020-12-31';
+----------------------------------------------------+
| subtracttimestamps(TIMESTAMP '2021-01-01 01:02:03.000001', DATE '2020-12-31') |
+----------------------------------------------------+
| 1 01:02:03.000001000                               |
+----------------------------------------------------+
1 row selected (1.637 seconds)

How was this patch tested?

By running new test:

$ ./build/sbt -Phive -Phive-thriftserver "test:testOnly *SparkThriftServerProtocolVersionsSuite"
$ ./build/sbt -Phive -Phive-thriftserver "test:testOnly *SparkMetadataOperationSuite"

Also checked an array of an interval:

0: jdbc:hive2://localhost:10000/default> select array(timestamp'2021-01-01 01:02:03.000001' - date'2020-12-31');
+----------------------------------------------------+
| array(subtracttimestamps(TIMESTAMP '2021-01-01 01:02:03.000001', DATE '2020-12-31')) |
+----------------------------------------------------+
| [1 01:02:03.000001000]                             |
+----------------------------------------------------+

@github-actions github-actions bot added the SQL label Apr 10, 2021
}
}

test(s"SPARK-35017: $version get day-time interval type") {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested only day-time interval because constructing of year-month interval in SQL is not possible at the moment. I created the JIRA for that: SPARK-35018

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR #32240 adds a test for year-month intervals.

@SparkQA
Copy link

SparkQA commented Apr 10, 2021

Test build #137168 has finished for PR 32121 at commit 1c6c93d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 10, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41746/

@SparkQA
Copy link

SparkQA commented Apr 10, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41746/

timeFormatters)
case _: ArrayType | _: StructType | _: MapType | _: UserDefinedType[_] =>
to += toHiveString((from.get(ordinal), dataTypes(ordinal)), false, timeFormatters)
case YearMonthIntervalType =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about nested types?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you mean the array, map types, they are converted to strings and they don't have a structure that could be recognized by Hive lib. Don't think I can do something here in the PR. This is a restriction of current implementation, and it should be solved in general way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, after the PR #32120 is merged:

0: jdbc:hive2://localhost:10000/default> select array(timestamp'2021-01-01 01:02:03.000001' - date'2020-12-31');
+----------------------------------------------------+
| array(subtracttimestamps(TIMESTAMP '2021-01-01 01:02:03.000001', DATE '2020-12-31')) |
+----------------------------------------------------+
| [1 01:02:03.000001000]                             |
+----------------------------------------------------+

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the client-side get the plain YearMonthIntervalType/DayTimeIntervalType values via HiveIntervalYearMonth/DayTimeIntervalType.toString, but the ones that in nested types are handled in toHiveString. They look the same but these branches seem to be redundant and may cause inconsistency if the underlying hive changed

Is

case YearMonthIntervalType | DayTimeIntervalType  | _: ArrayType | _: StructType | _: MapType | _: UserDefinedType[_] =>

better as we handle them consistently at spark-side?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I do as you propose, I get the exception on my test:

[info]   Cause: java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.hadoop.hive.common.type.HiveIntervalDayTime
[info]   at org.apache.hive.service.cli.ColumnValue.toTColumnValue(ColumnValue.java:174)
[info]   at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:60)
[info]   at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:32)

at

    case INTERVAL_YEAR_MONTH_TYPE:
      return stringValue((HiveIntervalYearMonth) value);
    case INTERVAL_DAY_TIME_TYPE:
      return stringValue((HiveIntervalDayTime) value);

Probably, I could just replace the casting by string casting. Let me try.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

import scala.collection.mutable.ArrayBuffer
import scala.util.control.NonFatal

import org.apache.hadoop.hive.common.`type`.{HiveIntervalDayTime, HiveIntervalYearMonth}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add HiveIntervalDayTime and HiveIntervalYearMonth to

val supportedType: Seq[Type] = {
Seq(NULL_TYPE, BOOLEAN_TYPE, STRING_TYPE, BINARY_TYPE,
TINYINT_TYPE, SMALLINT_TYPE, INT_TYPE, BIGINT_TYPE,
FLOAT_TYPE, DOUBLE_TYPE, DECIMAL_TYPE,
DATE_TYPE, TIMESTAMP_TYPE,
ARRAY_TYPE, MAP_TYPE, STRUCT_TYPE, CHAR_TYPE, VARCHAR_TYPE)
}
?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@SparkQA
Copy link

SparkQA commented Apr 11, 2021

Test build #137181 has finished for PR 32121 at commit feeeea5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 11, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41759/

Copy link
Member

@yaooqinn yaooqinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks making sense to me.

@HyukjinKwon
Copy link
Member

Merged to master.

@cloud-fan
Copy link
Contributor

late LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants