Skip to content

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

We made a mistake in #16944 . In HiveMetastoreCatalog#inferIfNeeded we infer the data schema, merge with full schema, and return the new full schema. At caller side we treat the full schema as data schema and set it to HadoopFsRelation.

This doesn't cause any problem because both parquet and orc can work with a wrong data schema that has extra columns, but it's better to fix this mistake.

How was this patch tested?

N/A

@cloud-fan
Copy link
Contributor Author

cc @budde @gatorsmile

@SparkQA
Copy link

SparkQA commented Oct 31, 2017

Test build #83234 has finished for PR 19615 at commit 46f530f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

also cc @dongjoon-hyun @viirya

@viirya
Copy link
Member

viirya commented Oct 31, 2017

LGTM


val (dataSchema, updatedTable) =
inferIfNeeded(relation, options, fileFormat, Option(fileIndex))
val updatedTable = inferIfNeeded(relation, options, fileFormat, Option(fileIndex))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we missed this. Actually the variable name is indicating it is data schema...

asfgit pushed a commit that referenced this pull request Oct 31, 2017
…eCatalog.convertToLogicalRelation

We made a mistake in #16944 . In `HiveMetastoreCatalog#inferIfNeeded` we infer the data schema, merge with full schema, and return the new full schema. At caller side we treat the full schema as data schema and set it to `HadoopFsRelation`.

This doesn't cause any problem because both parquet and orc can work with a wrong data schema that has extra columns, but it's better to fix this mistake.

N/A

Author: Wenchen Fan <[email protected]>

Closes #19615 from cloud-fan/infer.

(cherry picked from commit 4d9ebf3)
Signed-off-by: Wenchen Fan <[email protected]>
@cloud-fan
Copy link
Contributor Author

thanks, merging to master/2.2!

@asfgit asfgit closed this in 4d9ebf3 Oct 31, 2017
@dongjoon-hyun
Copy link
Member

Thank you so much, @cloud-fan !

MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
…eCatalog.convertToLogicalRelation

We made a mistake in apache#16944 . In `HiveMetastoreCatalog#inferIfNeeded` we infer the data schema, merge with full schema, and return the new full schema. At caller side we treat the full schema as data schema and set it to `HadoopFsRelation`.

This doesn't cause any problem because both parquet and orc can work with a wrong data schema that has extra columns, but it's better to fix this mistake.

N/A

Author: Wenchen Fan <[email protected]>

Closes apache#19615 from cloud-fan/infer.

(cherry picked from commit 4d9ebf3)
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants