Skip to content

Conversation

@windpiger
Copy link
Contributor

What changes were proposed in this pull request?

If we create a external datasource table with a non-qualified location , we should qualified it to store in catalog.

CREATE TABLE t(a string)
USING parquet
LOCATION '/path/xx'


CREATE TABLE t1(a string, b string)
USING parquet
PARTITIONED BY(b)
LOCATION '/path/xx'

when we get the table from catalog, the location should be qualified, e.g.'file:/path/xxx'

How was this patch tested?

unit test added

@SparkQA
Copy link

SparkQA commented Feb 28, 2017

Test build #73556 has finished for PR 17095 at commit 570ce24.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 28, 2017

Test build #73565 has finished for PR 17095 at commit 55c525e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 28, 2017

Test build #73582 has finished for PR 17095 at commit 18ec570.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 28, 2017

Test build #73591 has finished for PR 17095 at commit 22bef8b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@windpiger
Copy link
Contributor Author

cc @cloud-fan

validateName(table)
val newTableDefinition = tableDefinition.copy(identifier = TableIdentifier(table, Some(db)))

val newTableDefinition = if (tableDefinition.storage.locationUri.isDefined) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be easier if locationUri is of type URI?

Copy link
Contributor Author

@windpiger windpiger Mar 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the URI without schema is also legal, this fix also needed even if it is a URI.
while if it is a URI, we can do this when the URI created.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but why we have to store the full qualified path? What can we gain from this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the location without schema like hdfs/file, when we restore it from metastore, we did not know what filesystem where the table stored.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we apply it to all locations like database location, partition location?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they should all be applied this logic~
database has already contain this logic, shall I add the logic of partition in another pr?

@windpiger
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Mar 2, 2017

Test build #73760 has finished for PR 17095 at commit 7e08045.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 2, 2017

Test build #73773 has finished for PR 17095 at commit 9932b03.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
assert(table.location == dir.getAbsolutePath)
val dirPath = new Path(dir.getAbsolutePath)
val fs = dirPath.getFileSystem(spark.sessionState.newHadoopConf())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create a helper function to avoid the duplicate codes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks~ I add the makeQualifiedPath in this PR

after that PR mereged, I will fix confilct and do this modify.

private def getDBPath(dbName: String): URI = {
val warehousePath = s"file:${spark.sessionState.conf.warehousePath.stripPrefix("file:")}"
new Path(warehousePath, s"$dbName.db").toUri
val warehousePath = makeQualifiedPath(s"${spark.sessionState.conf.warehousePath}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just write spark.sessionState.conf.warehousePath, no need to wrap it with s""

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in 274973d Mar 9, 2017
@SparkQA
Copy link

SparkQA commented Mar 9, 2017

Test build #74259 has finished for PR 17095 at commit c7e837c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@windpiger
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Mar 9, 2017

Test build #74258 has finished for PR 17095 at commit 0919fea.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public static class LongWrapper
  • public static class IntWrapper
  • case class ResolveInlineTables(conf: CatalystConf) extends Rule[LogicalPlan]
  • case class CostBasedJoinReorder(conf: CatalystConf) extends Rule[LogicalPlan] with PredicateHelper
  • case class JoinPlan(itemIds: Set[Int], plan: LogicalPlan, joinConds: Set[Expression], cost: Cost)
  • case class Cost(rows: BigInt, size: BigInt)
  • abstract class RepartitionOperation extends UnaryNode
  • case class FlatMapGroupsWithState(
  • class CSVOptions(
  • class UnivocityParser(
  • trait WatermarkSupport extends UnaryExecNode
  • case class FlatMapGroupsWithStateExec(

@kayousterhout
Copy link
Contributor

kayousterhout commented Mar 17, 2017

I suspect that this PR is the cause of consistent failures in the maven build, in the HiveCatalogedDDLSuite unit test: https://spark-tests.appspot.com/test-details?suite_name=org.apache.spark.sql.hive.execution.HiveCatalogedDDLSuite&test_name=create+temporary+view+using

Based on the error message: https://spark-tests.appspot.com/test-logs/408097945 it looks like the way the path is getting re-written (I think by the code in this PR) is causing Hadoop's path code to barf. The create temporary view using unit test is the only one in that suite that reads from a CSV file, which would explain why that's the only one that's failing. @windpiger or @cloud-fan would one of you mind looking into this?

I filed a JIRA here: https://issues.apache.org/jira/browse/SPARK-19990

@kayousterhout
Copy link
Contributor

kayousterhout commented Mar 17, 2017

Sounds like this was caused by a different PR (see the comment on the JIRA) and is now being fixed by @windpiger, so never mind here (and thanks @windpiger for looking into this!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants