-
Notifications
You must be signed in to change notification settings - Fork 1.5k
PARQUET-1504: Add an option to convert Parquet Int96 to Arrow Timestamp #594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it be possible that in the future we need to convert int96 to some other type in arrow (or for any other type convertion)? If that's something likely to happen, perhaps it would be better to pass the parameter in a configuration object (such as
org.apache.hadoop.conf.Configuration)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the context of Parquet files
INT96was really only used for nanosecond timestamps. Thus we probably don't provide an alternative option to convert it to another type.Please note that the use of the
INT96is discouraged in general. There is now a newTIMESTAMP_NANOStype to replace it andTIMESTAMP_MILLISis the general default for timestamps nowadays.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice, INT96 is only used for timestamps, however, its semantics depend on what component wrote the file. Hive, Spark and Impala all contain a check
footerFileMetaData.getCreatedBy().startsWith("parquet-mr")or similar, because if the file was written by Hive, Spark, or any other applications using the parquet-mr library then the timestamps are normalized to UTC, but if the file was written by Impala then it is in LocalDateTime semantics.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to pass this later on as an option from the outside. The current routines don't know anything about the file creator. @yongyanw can you add this as a TODO comment and open a JIRA for it? I don't expect that we can solve this in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PARQUET-1511 was created and TODO comment was added.