-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
Describe the enhancement requested
Background: There was an effort to fix inconsistent timestamp types across different SQL-on-Hadoop engines: https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q
In the Apache Orc, two timestamp types are provided:
- TIMESTAMP: timestamp type without timezone, timestamp value is stored in the writer timezone .
- TIMESTAMP_INSTANT: timestamp type with local timezone, timestamp value is stored in the UTC timezone.
arrow::TimestampType has an optional timezone field: https://github.com/apache/arrow/blob/main/cpp/src/arrow/type.h#L1385
- If timezone is provided, values are normalized in UTC.
- If timezone is missing, values can be in any timezone.
Therefore, the type mapping should be as below:
- orc::TIMESTAMP <=> arrow::TimestampType w/o timezone
- orc::TIMESTAMP_INSTANT <=> arrow::TimestampType w/ timezone
Component(s)
C++