-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-34984][SQL] ANSI intervals formatting in hive results #32087
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@yaooqinn @AngersZhuuuu @gengliangwang @cloud-fan Could you review this PR, please. |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
The failed tests from the I ran the test suites locally w/o any problems. And the tests should not be related to this PR. TPC-DS GA failed with known issue: |
|
Test build #137062 has finished for PR 32087 at commit
|
|
thanks, merging to master! |
|
Is the output matched with Hive results? I did a quick test and seems not: |
|
I am trying to enable ANSI intervals in diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
index 8ca0ab91a7..4b080a99f6 100644
--- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
+++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
@@ -120,7 +120,8 @@ private[hive] class SparkExecuteStatementOperation(
(from.getAs[CalendarInterval](ordinal), CalendarIntervalType),
false,
timeFormatters)
- case _: ArrayType | _: StructType | _: MapType | _: UserDefinedType[_] =>
+ case _: ArrayType | _: StructType | _: MapType | _: UserDefinedType[_] |
+ YearMonthIntervalType | DayTimeIntervalType =>
to += toHiveString((from.get(ordinal), dataTypes(ordinal)), false, timeFormatters)
}
}
@@ -377,6 +378,8 @@ object SparkExecuteStatementOperation {
val attrTypeString = field.dataType match {
case NullType => "void"
case CalendarIntervalType => StringType.catalogString
+ case YearMonthIntervalType => "INTERVAL_YEAR_MONTH"
+ case DayTimeIntervalType => "INTERVAL_DAY_TIME"
case other => other.catalogString
}
new FieldSchema(field.name, attrTypeString, field.getComment.getOrElse(""))but the interval values that sent via JDBC are not recognized by Hive lib: java.lang.IllegalArgumentException: Interval string does not match day-time format of 'd h:m:s.n': INTERVAL '0 16:00:00' DAY TO SECOND
at org.apache.hadoop.hive.common.type.HiveIntervalDayTime.valueOf(HiveIntervalDayTime.java:234)
at org.apache.hive.jdbc.HiveBaseResultSet.evaluate(HiveBaseResultSet.java:452)
at org.apache.hive.jdbc.HiveBaseResultSet.getColumnValue(HiveBaseResultSet.java:424)
at org.apache.hive.jdbc.HiveBaseResultSet.getObject(HiveBaseResultSet.java:464)It seems we have to especially format intervals for Thrift server (and Hive libs) in the format: |
|
Yea |
|
@cloud-fan @HyukjinKwon I have changed the output of |
What changes were proposed in this pull request?
Extend
HiveResult.toHiveString()to support new interval typesYearMonthIntervalTypeandDayTimeIntervalType.Why are the changes needed?
To fix failures while formatting ANSI intervals as Hive strings. For example:
Does this PR introduce any user-facing change?
Yes. After the changes:
How was this patch tested?
By running new tests: