Skip to content

Conversation

@beliefer
Copy link
Contributor

@beliefer beliefer commented Apr 26, 2021

What changes were proposed in this pull request?

This PR let JDBC clients identify ANSI interval columns properly.

Why are the changes needed?

This PR is similar to #29539.
JDBC users can query interval values through thrift server, create views with ansi interval columns, e.g.
CREATE global temp view view1 as select interval '1-1' year to month as I;
but when they want to get the details of the columns of view1, the will fail with Unrecognized type name: YEAR-MONTH INTERVAL

Caused by: java.lang.IllegalArgumentException: Unrecognized type name: YEAR-MONTH INTERVAL
	at org.apache.spark.sql.hive.thriftserver.SparkGetColumnsOperation.toJavaSQLType(SparkGetColumnsOperation.scala:190)
	at org.apache.spark.sql.hive.thriftserver.SparkGetColumnsOperation.$anonfun$addToRowSet$1(SparkGetColumnsOperation.scala:206)
	at scala.collection.immutable.List.foreach(List.scala:392)
	at org.apache.spark.sql.hive.thriftserver.SparkGetColumnsOperation.addToRowSet(SparkGetColumnsOperation.scala:198)
	at org.apache.spark.sql.hive.thriftserver.SparkGetColumnsOperation.$anonfun$runInternal$7(SparkGetColumnsOperation.scala:109)
	at org.apache.spark.sql.hive.thriftserver.SparkGetColumnsOperation.$anonfun$runInternal$7$adapted(SparkGetColumnsOperation.scala:109)
	at scala.Option.foreach(Option.scala:407)
	at org.apache.spark.sql.hive.thriftserver.SparkGetColumnsOperation.$anonfun$runInternal$5(SparkGetColumnsOperation.scala:109)
	at org.apache.spark.sql.hive.thriftserver.SparkGetColumnsOperation.$anonfun$runInternal$5$adapted(SparkGetColumnsOperation.scala:107)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.spark.sql.hive.thriftserver.SparkGetColumnsOperation.runInternal(SparkGetColumnsOperation.scala:107)
	... 34 more

Does this PR introduce any user-facing change?

Yes. Let hive JDBC recognize ANSI interval.

How was this patch tested?

Jenkins test.

@github-actions github-actions bot added the SQL label Apr 26, 2021
@SparkQA
Copy link

SparkQA commented Apr 26, 2021

Test build #137936 has finished for PR 32345 at commit 23c8cf2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 26, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42458/

@SparkQA
Copy link

SparkQA commented Apr 26, 2021

Test build #137939 has finished for PR 32345 at commit 211dbf2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 26, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42461/

@beliefer beliefer changed the title [WIP][SPARK-35085][SQL] Get columns operation should handle ANSI interval column properly [SPARK-35085][SQL] Get columns operation should handle ANSI interval column properly Apr 26, 2021
@beliefer
Copy link
Contributor Author

ping @MaxGekk

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beliefer Could you fill in the PR description, please.

@beliefer
Copy link
Contributor Author

@beliefer Could you fill in the PR description, please.

I'm sorry! I forgot it.

Comment on lines 174 to 175
case IntegerType | YearMonthIntervalType => java.sql.Types.INTEGER
case LongType | DayTimeIntervalType => java.sql.Types.BIGINT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I am not sure that we should expose ANSI intervals as raw integers/longs via JDBC. I would consider strings (preferable) or java.time.Duration/Period. @cloud-fan @srielau WDYT?

Copy link
Contributor Author

@beliefer beliefer Apr 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! I have the confusion too.
cc @cloud-fan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the metadata, where do we handle the data? e.g. if we want to return string or Duration/Period, where shall we instantiate string or Duration/Period values?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the PR #32121. We should return strings. Since there is no appropriate type in java.sql.* for intervals. I believe we should return the same as for CalendarIntervalType - java.sql.Types.OTHER.

@beliefer Could you handle YearMonthIntervalType and DayTimeIntervalType separately, and return java.sql.Types.OTHER

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

private def getColumnSize(typ: DataType): Option[Int] = typ match {
case dt @ (BooleanType | _: NumericType | DateType | TimestampType |
CalendarIntervalType | NullType) =>
CalendarIntervalType | NullType | YearMonthIntervalType | DayTimeIntervalType) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Size of what does it return? CalendarIntervalType, YearMonthIntervalType, DayTimeIntervalType are returned as strings in rowSets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CalendarIntervalType return it's defaultSize (4 + 4 + 8 = 16).
It seems YearMonthIntervalType and DayTimeIntervalType should return defaultSize too.

@SparkQA
Copy link

SparkQA commented Apr 28, 2021

Test build #138023 has finished for PR 32345 at commit 4c594bb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 28, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42542/

@MaxGekk
Copy link
Member

MaxGekk commented Apr 28, 2021

+1, LGTM. Merging to master.
Thank you, @beliefer and @cloud-fan for review.

@MaxGekk MaxGekk closed this in 56bb815 Apr 28, 2021
@beliefer
Copy link
Contributor Author

@MaxGekk @cloud-fan Thank you.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants