Skip to content

Commit c1e7989

Browse files
Michael Allmancloud-fan
authored andcommitted
[SPARK-20888][SQL][DOCS] Document change of default setting of spark.sql.hive.caseSensitiveInferenceMode
(Link to Jira: https://issues.apache.org/jira/browse/SPARK-20888) ## What changes were proposed in this pull request? Document change of default setting of spark.sql.hive.caseSensitiveInferenceMode configuration key from NEVER_INFO to INFER_AND_SAVE in the Spark SQL 2.1 to 2.2 migration notes. Author: Michael Allman <[email protected]> Closes #18112 from mallman/spark-20888-document_infer_and_save.
1 parent 98c3852 commit c1e7989

File tree

1 file changed

+5
-1
lines changed

1 file changed

+5
-1
lines changed

docs/sql-programming-guide.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1223,7 +1223,7 @@ the following case-insensitive options:
12231223
This is a JDBC writer related option. If specified, this option allows setting of database-specific table and partition options when creating a table (e.g., <code>CREATE TABLE t (name string) ENGINE=InnoDB.</code>). This option applies only to writing.
12241224
</td>
12251225
</tr>
1226-
1226+
12271227
<tr>
12281228
<td><code>createTableColumnTypes</code></td>
12291229
<td>
@@ -1444,6 +1444,10 @@ options.
14441444

14451445
# Migration Guide
14461446

1447+
## Upgrading From Spark SQL 2.1 to 2.2
1448+
1449+
- Spark 2.1.1 introduced a new configuration key: `spark.sql.hive.caseSensitiveInferenceMode`. It had a default setting of `NEVER_INFER`, which kept behavior identical to 2.1.0. However, Spark 2.2.0 changes this setting's default value to `INFER_AND_SAVE` to restore compatibility with reading Hive metastore tables whose underlying file schema have mixed-case column names. With the `INFER_AND_SAVE` configuration value, on first access Spark will perform schema inference on any Hive metastore table for which it has not already saved an inferred schema. Note that schema inference can be a very time consuming operation for tables with thousands of partitions. If compatibility with mixed-case column names is not a concern, you can safely set `spark.sql.hive.caseSensitiveInferenceMode` to `NEVER_INFER` to avoid the initial overhead of schema inference. Note that with the new default `INFER_AND_SAVE` setting, the results of the schema inference are saved as a metastore key for future use. Therefore, the initial schema inference occurs only at a table's first access.
1450+
14471451
## Upgrading From Spark SQL 2.0 to 2.1
14481452

14491453
- Datasource tables now store partition metadata in the Hive metastore. This means that Hive DDLs such as `ALTER TABLE PARTITION ... SET LOCATION` are now available for tables created with the Datasource API.

0 commit comments

Comments
 (0)