Skip to content

Conversation

@marmbrus
Copy link
Contributor

This is the first of a few improvements I'd like to make to the data sources API. I'll note that this does not correctly handle multiple databases. However, since much of the code does not, and since I don't think improving this will change the public API, I think its reasonable to handle that in a separate PR.

@SparkQA
Copy link

SparkQA commented Dec 20, 2014

Test build #24677 has started for PR 3752 at commit 055869e.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 20, 2014

Test build #24677 has finished for PR 3752 at commit 055869e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class TableIdent(database: String, name: String)
    • case class CreateMetastoreDataSource(

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24677/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Dec 21, 2014

Test build #24689 has started for PR 3752 at commit 1002d20.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 21, 2014

Test build #24689 has finished for PR 3752 at commit 1002d20.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class TableIdent(database: String, name: String)
    • case class CreateMetastoreDataSource(

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24689/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be used to retrieve all SerDe properties:

table.getTTable.getSd.getSerdeInfo.getParameters

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think SerDeInfo might not be a proper place to put external data source table options. Semantically, these options are more like general table properties, thus using table.putProperty might be better.

@yhuai Are there notable differences between SerDe properties and general table properties in Hive? Or to be more specifically, differences between properties saved in metastore.Table.getTTable.getParameters and those in `metastore.Table.getTTable.getSd.getSerdeInfo.getParameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning here was that table properties might have other things that could conflict with the options that a data source requires. It seems like SerDe properties are scoped to hold just the options that describe how the serialization library should read the data (which seems analogous to our data sources).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanations, then this sounds good to me.

@SparkQA
Copy link

SparkQA commented Dec 22, 2014

Test build #24707 has started for PR 3752 at commit e23f8fb.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 22, 2014

Test build #24708 has started for PR 3752 at commit 563ff40.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 22, 2014

Test build #24707 has finished for PR 3752 at commit e23f8fb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class TableIdent(database: String, name: String)
    • case class CreateMetastoreDataSource(

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24707/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Dec 22, 2014

Test build #24708 has finished for PR 3752 at commit 563ff40.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class TableIdent(database: String, name: String)
    • case class CreateMetastoreDataSource(

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24708/
Test PASSed.

@marmbrus marmbrus changed the title [SPARK-4912][SQL] Persistent tables for the Spark SQL data sources [SPARK-4912][SQL] Persistent tables for the Spark SQL data sources api Dec 30, 2014
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using serde properties to store all parameters that will be passed to a relation provider (for creating a relation), right? Probably we can add a comment at here.

@yhuai
Copy link
Contributor

yhuai commented Jan 21, 2015

Let's close it since #3960 has been merged.

@marmbrus marmbrus closed this Jan 21, 2015
@marmbrus marmbrus deleted the persistentDataSourceTables branch March 9, 2015 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants