-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-4912][SQL] Persistent tables for the Spark SQL data sources api #3752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #24677 has started for PR 3752 at commit
|
|
Test build #24677 has finished for PR 3752 at commit
|
|
Test FAILed. |
|
Test build #24689 has started for PR 3752 at commit
|
|
Test build #24689 has finished for PR 3752 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be used to retrieve all SerDe properties:
table.getTTable.getSd.getSerdeInfo.getParametersThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I think SerDeInfo might not be a proper place to put external data source table options. Semantically, these options are more like general table properties, thus using table.putProperty might be better.
@yhuai Are there notable differences between SerDe properties and general table properties in Hive? Or to be more specifically, differences between properties saved in metastore.Table.getTTable.getParameters and those in `metastore.Table.getTTable.getSd.getSerdeInfo.getParameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reasoning here was that table properties might have other things that could conflict with the options that a data source requires. It seems like SerDe properties are scoped to hold just the options that describe how the serialization library should read the data (which seems analogous to our data sources).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanations, then this sounds good to me.
|
Test build #24707 has started for PR 3752 at commit
|
|
Test build #24708 has started for PR 3752 at commit
|
|
Test build #24707 has finished for PR 3752 at commit
|
|
Test PASSed. |
|
Test build #24708 has finished for PR 3752 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are using serde properties to store all parameters that will be passed to a relation provider (for creating a relation), right? Probably we can add a comment at here.
|
Let's close it since #3960 has been merged. |
This is the first of a few improvements I'd like to make to the data sources API. I'll note that this does not correctly handle multiple databases. However, since much of the code does not, and since I don't think improving this will change the public API, I think its reasonable to handle that in a separate PR.