-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-28794][SQL][DOC] Documentation for Create table Command #26759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-28794][SQL][DOC] Documentation for Create table Command #26759
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs some proofreading, here and below. Space needs to follow punctuation.
"Data source" needs to be consistent and refer to the argument above.
"using which the table is created" -> used to create the table.
But can you say any more about what this means? This isn't really adding much documentation.
|
This still has a lot of basic syntax, grammar and formatting problems. Please proofread per above. |
18be12a to
70cae22
Compare
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the docs need to have a little more, well, documentation here. It's just repeating what the syntax implies already. It doesn't need to be super in depth, but, it's worth asking: what is a reader getting out of this page that they won't already know?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes it sound like DATASOURCE is a keyword. Don't you want to write something like USING data_source above? and match it below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's list all the possible valid values at the moment, or link to them somehow. I don't think the implementation detail in the last clause is important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove 'are'. Can we provide any links to what bucketing means?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awkwardly worded. "Sets key-value properties on the table, such as ..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we say a little more -- it's a path to a directory, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does Hive format mean here? (for the reader)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not INSERT but CREATE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about create -> define to avoid doubly saying create...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adds SORTED BY:
spark/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
Line 289 in 18e8d1d
| (SORTED BY orderedIdentifierList)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to add CREATE TABLE LIKE, too?
spark/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
Line 129 in 18e8d1d
| | CREATE TABLE (IF NOT EXISTS)? target=tableIdentifier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the other PARTITIONED BY case:
spark/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
Line 121 in 18e8d1d
| PARTITIONED BY partitionColumnNames=identifierList) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SKEWED BY:
spark/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
Line 123 in 18e8d1d
| skewSpec | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu I think it is not supported.I checked SparkSqlParser class and found this.

|
kindly ping again, @PavithraRamachandran |
|
I shall update the PR , with the necessary corrections as per the review comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not to create a Hive serde table since Spark 3.0. See #26736
70cae22 to
62502a2
Compare
|
ok to test |
|
Test build #116731 has finished for PR 26759 at commit
|
|
@PavithraRamachandran there are still unresolved comments from a month ago. Please address all of them. |
|
@PavithraRamachandran If you don't have enogh time to keep this, I can take this over. |
|
i shall complete today |
62502a2 to
50996f2
Compare
|
Test build #116818 has finished for PR 26759 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps change the column names to id, name, age to be more meaningful ? Also can you please put semi colon at the end in the examples just to be consistent with other docs ?
cc @huaxingao can you please check on the consistency part if you have some time ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we say "input format" instead of "file format". For example, JDBC is data source is not a file format, right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
STORED AS ?
|
Test build #116836 has finished for PR 26759 at commit
|
f835058 to
b7dab5d
Compare
|
Test build #116848 has finished for PR 26759 at commit
|
|
Test build #116860 has finished for PR 26759 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sql-ref-syntax-ddl-create-table-like.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More options here:
spark/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
Line 131 in 3848999
| | CREATE TABLE (IF NOT EXISTS)? target=tableIdentifier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please use table_identifier instead of [db_name.]table_name? Put the syntax of table_identifier in Parameters section. You can refer to any of the docs that has table_identifier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to make all the docs follow the same convention: put a space in between the symbols (e.g. '|', '=', '[]') and text. Refer to sql-ref-syntax-ddl-create-database as an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add ; in the end of all the sql statements in example sections?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a Related Statements section to link the related statements?
c545e9a to
bc2aef8
Compare
|
Test build #116953 has finished for PR 26759 at commit
|
|
Test build #116955 has finished for PR 26759 at commit
|
|
|
||
| ### Related Statements | ||
| * [CREATE TABLE USING HIVE FORMAT](sql-ref-syntax-ddl-create-table-hiveformat.html) | ||
| * [CREATE TABLE LIKE](ssql-ref-syntax-ddl-create-table-like.html) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This link is broken. You have an extra s in ssql
|
|
||
| ### Related Statements | ||
| * [CREATE TABLE USING DATASOURCE](sql-ref-syntax-ddl-create-table-datasource.html) | ||
| * [CREATE TABLE LIKE](ssql-ref-syntax-ddl-create-table-like.html) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
broken link
| The CREATE statements: | ||
| * [CREATE TABLE USING DATASOURCE](sql-ref-syntax-ddl-create-table-datasource.html) | ||
| * [CREATE TABLE USING HIVE FORMAT](sql-ref-syntax-ddl-create-table-hiveformat.html) | ||
| * [CREATE TABLE LIKE](ssql-ref-syntax-ddl-create-table-like.html) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
broken link
| USING data_source | ||
| [ OPTIONS ( key1=val1, key2=val2, ... ) ] | ||
| [ PARTITIONED BY ( col_name1, col_name2, ... ) ] | ||
| [ CLUSTERED BY ( col_name3, col_name4, ... ) [ SORTED BY ( col_name [ ASC | DESC ], ... ) ] INTO num_buckets BUCKETS ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier | ||
| [ ( col_name1[:] col_type1 [ COMMENT col_comment1 ], ... ) ] | ||
| [ COMMENT table_comment ] | ||
| [ PARTITIONED BY ( col_name2[:] col_type2 [ COMMENT col_comment2 ], ... ) | ( col_name1, col_name2, ... ) ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
break the line
|
Test build #116961 has finished for PR 26759 at commit
|
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth another proofreading too
|
|
||
| <dl> | ||
| <dt><code><em>ROW FORMAT</em></code></dt> | ||
| <dd>SERDE is used to specify a custom SerDe or the DELIMITED clause inorder to use the native SerDe.</dd> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inorder -> in order
|
|
||
| <dl> | ||
| <dt><code><em>LOCATION</em></code></dt> | ||
| <dd>Path to the directory where table data is stored, could be filesystem, HDFS, etc.</dd> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and below, better as "... data is stored, which could be a path on distributed storage like HDFS, etc."
| <dl> | ||
| <dt><code><em>TBLPROPERTIES</em></code></dt> | ||
| <dd> | ||
| Table properties that has to be set are specified,such as `created.by.user`, `owner`, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that have to be set
space after comma
|
Test build #117244 has finished for PR 26759 at commit
|
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Close enough, I think.
|
Merged to master |

What changes were proposed in this pull request?
Document CREATE TABLE statement in SQL Reference Guide.
Why are the changes needed?
Adding documentation for SQL reference.
Does this PR introduce any user-facing change?
yes
Before:
There was no documentation for this.
How was this patch tested?
Used jekyll build and serve to verify.