-
Notifications
You must be signed in to change notification settings - Fork 505
ORC-1920: [C++] Support Geometry and Geography types
#2269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Geometry and Geography types
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @ffacs .
cc @wgtmac and @williamhyun
| /// | ||
| /// These values correspond to the 1, 2, ..., 7 component of the WKB integer | ||
| /// geometry type (i.e., the value of geometry_type % 1000). | ||
| enum class GeometryType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Java, we had an additional value, -1, for UNKNOWN_TYPE_ID.
| private static final int UNKNOWN_TYPE_ID = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C++ employs a different implementation that invalidates statistics when unknown types are encountered; therefore, UNKNOWN_TYPE_ID is unnecessary in this context.
| } | ||
|
|
||
| std::stringstream ss; | ||
| ss << "<GeoStatistics>"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto. Why do we have this only in C++ implementation? If this is required, can we have this in Java first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is due to gap between Parquet Java and C++ implementations...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is due to gap between Parquet Java and C++ implementations..
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few minor comments about the feature parity between Java and C++ implementations.
williamhyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thank you @ffacs for this PR, waiting on the above comment before merging.
|
Gentle ping, @ffacs . |
Pong. Thank you for your review @dongjoon-hyun , I'd update this patch these days. |
|
Thank you, @ffacs . Also, please participate the 1.8.10 vote too when you have some time. |
| } | ||
|
|
||
| std::stringstream ss; | ||
| ss << "<GeoStatistics>"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is due to gap between Parquet Java and C++ implementations...
|
Could you fix the above comments, @ffacs ? |
|
Let me know when ready to review. |
|
@wgtmac @dongjoon-hyun I'm ready to review now, please take a look when you're free~ |
|
Thank you, @ffacs . |
| * This file contains code adapted from the Apache Arrow project. | ||
| * | ||
| * Original source: | ||
| * https://github.com/apache/arrow/blob/main/cpp/src/parquet/geospatial/statistics.h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please provide a tag based location instead of the main branch because it changes always.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no tags that contain this patch yet.
| * This file contains code adapted from the Apache Arrow project. | ||
| * | ||
| * Original source: | ||
| * https://github.com/apache/arrow/blob/main/cpp/src/parquet/geospatial/statistics.cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please provide a tag based location instead of the main branch because it changes always.
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
c++/include/orc/Type.hh
Outdated
| virtual const std::string& getCRS() const = 0; | ||
| virtual geospatial::EIAlgo getEIAlgo() const = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| virtual const std::string& getCRS() const = 0; | |
| virtual geospatial::EIAlgo getEIAlgo() const = 0; | |
| virtual const std::string& getCrs() const = 0; | |
| virtual geospatial::EdgeInterpolationAlgorithm getAlgorithm() const = 0; |
It would be good to add comment to say these two functions are for geometry & geography types only.
| void WKBGeometryBounder::mergeGeometryInternal(WKBBuffer* src, bool recordWkbType) { | ||
| uint8_t endian = src->ReadUInt8(); | ||
| bool swap = endian != 0x00; | ||
| if (isLittleEndian()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can use if constexpr (std::endian::native == std::endian::little)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::endian requires C++20, we are using c++17 now.
My only concern is the name of public api: https://github.com/apache/orc/pull/2269/files#r2184135226. It is better to use |
|
Thank you everyone, @ffacs , @williamhyun , @wgtmac . |
What changes were proposed in this pull request?
Support Geometry and Geography types for c++ side
Why are the changes needed?
Add support for Geometry and Geography types
How was this patch tested?
UT passed
Was this patch authored or co-authored using generative AI tooling?
No