Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Apr 8, 2021

What changes were proposed in this pull request?

Extend HiveResult.toHiveString() to support new interval types YearMonthIntervalType and DayTimeIntervalType.

Why are the changes needed?

To fix failures while formatting ANSI intervals as Hive strings. For example:

spark-sql> select timestamp'now' - date'2021-01-01';
21/04/08 09:42:49 ERROR SparkSQLDriver: Failed in [select timestamp'now' - date'2021-01-01']
scala.MatchError: (PT2337H42M46.649S,DayTimeIntervalType) (of class scala.Tuple2)
	at org.apache.spark.sql.execution.HiveResult$.toHiveString(HiveResult.scala:97)

Does this PR introduce any user-facing change?

Yes. After the changes:

spark-sql> select timestamp'now' - date'2021-01-01';
INTERVAL '97 09:37:52.171' DAY TO SECOND

How was this patch tested?

By running new tests:

$ build/sbt -Phive-2.3 -Phive-thriftserver "testOnly *HiveResultSuite"

@github-actions github-actions bot added the SQL label Apr 8, 2021
@MaxGekk
Copy link
Member Author

MaxGekk commented Apr 8, 2021

@yaooqinn @AngersZhuuuu @gengliangwang @cloud-fan Could you review this PR, please.

@SparkQA
Copy link

SparkQA commented Apr 8, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41640/

@SparkQA
Copy link

SparkQA commented Apr 8, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41640/

@MaxGekk
Copy link
Member Author

MaxGekk commented Apr 8, 2021

The failed tests from the hive - other tests:

[info] *** 2 TESTS FAILED ***
[error] Error: Total 687, Failed 5, Errors 88, Passed 594, Ignored 1
[error] Failed tests:
[error] 	org.apache.spark.sql.hive.JavaDataFrameSuite
[error] 	org.apache.spark.sql.hive.JavaMetastoreDataSourcesSuite
[error] 	org.apache.spark.sql.hive.HiveSharedStateSuite

I ran the test suites locally w/o any problems. And the tests should not be related to this PR.

TPC-DS GA failed with known issue:

[info] org.apache.spark.sql.TPCDSQueryTestSuite *** ABORTED *** (1 second, 7 milliseconds)
[info]   java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 100 retries (on a random free port)! Consider 

@SparkQA
Copy link

SparkQA commented Apr 8, 2021

Test build #137062 has finished for PR 32087 at commit d993e76.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 96a3533 Apr 8, 2021
@HyukjinKwon
Copy link
Member

Is the output matched with Hive results? I did a quick test and seems not:

0: jdbc:hive2://localhost:10000/default> select INTERVAL '1-2' YEARS TO MONTH;
+------+
| _c0  |
+------+
| 1-2  |
+------+

@MaxGekk
Copy link
Member Author

MaxGekk commented Apr 9, 2021

I am trying to enable ANSI intervals in ThriftServerQueryTestSuite, see #32099. I made the following changes at Spark side:

diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
index 8ca0ab91a7..4b080a99f6 100644
--- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
+++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
@@ -120,7 +120,8 @@ private[hive] class SparkExecuteStatementOperation(
           (from.getAs[CalendarInterval](ordinal), CalendarIntervalType),
           false,
           timeFormatters)
-      case _: ArrayType | _: StructType | _: MapType | _: UserDefinedType[_] =>
+      case _: ArrayType | _: StructType | _: MapType | _: UserDefinedType[_] |
+          YearMonthIntervalType | DayTimeIntervalType =>
         to += toHiveString((from.get(ordinal), dataTypes(ordinal)), false, timeFormatters)
     }
   }
@@ -377,6 +378,8 @@ object SparkExecuteStatementOperation {
       val attrTypeString = field.dataType match {
         case NullType => "void"
         case CalendarIntervalType => StringType.catalogString
+        case YearMonthIntervalType => "INTERVAL_YEAR_MONTH"
+        case DayTimeIntervalType => "INTERVAL_DAY_TIME"
         case other => other.catalogString
       }
       new FieldSchema(field.name, attrTypeString, field.getComment.getOrElse(""))

but the interval values that sent via JDBC are not recognized by Hive lib:

java.lang.IllegalArgumentException: Interval string does not match day-time format of 'd h:m:s.n': INTERVAL '0 16:00:00' DAY TO SECOND
	at org.apache.hadoop.hive.common.type.HiveIntervalDayTime.valueOf(HiveIntervalDayTime.java:234)
	at org.apache.hive.jdbc.HiveBaseResultSet.evaluate(HiveBaseResultSet.java:452)
	at org.apache.hive.jdbc.HiveBaseResultSet.getColumnValue(HiveBaseResultSet.java:424)
	at org.apache.hive.jdbc.HiveBaseResultSet.getObject(HiveBaseResultSet.java:464)

It seems we have to especially format intervals for Thrift server (and Hive libs) in the format:
https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/storage-api/src/java/org/apache/hadoop/hive/common/type/HiveIntervalDayTime.java#L244-L245
cc @cloud-fan @HyukjinKwon @@srielau

@cloud-fan
Copy link
Contributor

Yea toHiveResult can't use the toYearMonthIntervalString directly.

@MaxGekk
Copy link
Member Author

MaxGekk commented Apr 10, 2021

@cloud-fan @HyukjinKwon I have changed the output of HiveResult methods. Please, review #32120.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants