-
Notifications
You must be signed in to change notification settings - Fork 7
ORC-1717: Add geometry and geography types #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you, @wgtmac . Is it for Apache ORC 1.1.0? If this is final, I'm +1. |
|
Also, cc @williamhyun |
|
@dongjoon-hyun This is not finalized yet. I will update it once ready. |
|
Thank you! If it's finalized, please send a head-up email to dev@orc once more. |
|
Thank you for updating. Is this ready, @wgtmac ? |
|
Yes, I believe so. And I will start PoC implementations later. |
|
Got it. When do you want to release
|
|
I think we can wait until both Java and C++ PoC implementations are finished. |
|
+1 for the decision. Thank you for the confirmation. |
|
Thank you. Is this ready, @wgtmac ? |
|
Yes, this is ready for review @dongjoon-hyun cc @ffacs |
| optional TimestampStatistics timestampStatistics = 9; | ||
| optional bool hasNull = 10; | ||
| optional uint64 bytes_on_disk = 11; | ||
| optional CollectionStatistics collection_statistics = 12; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we handle these two above lines independently because this is irrelevant to the geometry, @wgtmac ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created #22 to address this.
| // Statistics specific to Geometry or Geography type | ||
| message GeospatialStatistics { | ||
| // A bounding box of geospatial instances | ||
| optional BoundingBox bbox = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is bbox a well-known name? I'm just wondering if we can use bounding_box like the other field names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, bbox is a well-known geospatial acronym.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @wgtmac .
I have three comments.
bytes_on_diskandcollection_statistics(https://github.com/apache/orc-format/pull/18/files#r2032264096)bbox(https://github.com/apache/orc-format/pull/18/files#r2032265481)- This PR is proposed to support Apache Iceberg. However, are you sure that Apache Iceberg community keeps Apache ORC support in the next table format?
If we are not sure about (3), let's revise the PR description by avoiding mentioning Apache Iceberg community.m
I believe this PR itself has enough meaning independently to Apache ORC community. I'm +1 for this additional feature.
cc @williamhyun too
|
Gentle ping, @wgtmac . |
I'm not sure about the long-term goal. The recent discussion w.r.t the file format API still considers Apache ORC: https://lists.apache.org/thread/ovyh52m2b6c1hrg4fhw3rx92bzr793n2 |
|
Thank you, @wgtmac . I'm referring the latest Apache Iceberg community sync-up meeting on March 16th which superseded the above discussion on February 12th. |
|
Thanks for the info! I just watched that video and understand the concern from the Iceberg community. There are several issues like missing features (variant, geometry, etc.) and implementation (default values, schema evolution, etc.) For the latter issue, does anyone in the Apache ORC community know the detail? |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you for updating, @wgtmac .
|
Since this is an improvement without a breaking change, shall we release as Apache ORC v1.1.0? WDYT, @wgtmac ? |
|
@dongjoon-hyun Sounds good! I think this can help the implementation a lot. |
### What changes were proposed in this pull request? This PR aims to upgrade ORC Format to 1.1.0. ### Why are the changes needed? To bring the latest feature and bug fixes. - https://github.com/apache/orc-format/milestone/2?closed=1 - apache/orc-format#18 - apache/orc-format#19 ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #2190 from dongjoon-hyun/ORC-1876. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request? This PR aims to upgrade ORC Format to 1.1.0. ### Why are the changes needed? To bring the latest feature and bug fixes. - https://github.com/apache/orc-format/milestone/2?closed=1 - apache/orc-format#18 - apache/orc-format#19 ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #2190 from dongjoon-hyun/ORC-1876. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit f5e2413) Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request? This PR aims to upgrade Apache ORC Format to 1.1.0 from 1.1.0 for Apache Spark 4.1.0. ### Why are the changes needed? Apache ORC Format v1.1.0 is released on 2025-04-18. - https://github.com/apache/orc-format/releases/tag/v1.1.0 To bring the latest feature and bug fixes. - https://github.com/apache/orc-format/milestone/2?closed=1 - apache/orc-format#18 - apache/orc-format#19 ### Does this PR introduce _any_ user-facing change? No, there is no behavior change. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #50588 from dongjoon-hyun/SPARK-51801. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
Add geometry and geography types to Apache ORC.
Why are the changes needed?
Geospatial support is a missing feature and it is supported by many popular databases, query engines, computing frameworks, etc.
How was this patch tested?
N/A