Skip to content

Conversation

@ffacs
Copy link
Contributor

@ffacs ffacs commented Jun 16, 2025

What changes were proposed in this pull request?
Support Geometry and Geography types for c++ side

Why are the changes needed?
Add support for Geometry and Geography types

How was this patch tested?
UT passed

Was this patch authored or co-authored using generative AI tooling?
No

@ffacs ffacs changed the title ORC-1920: Support Geometry and Geography types ORC-1920: [C++] Support Geometry and Geography types Jun 16, 2025
@github-actions github-actions bot added the CPP label Jun 16, 2025
@ffacs ffacs changed the title ORC-1920: [C++] Support Geometry and Geography types ORC-1920: [C++] Support Geometry and Geography types Jun 16, 2025
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @ffacs .

cc @wgtmac and @williamhyun

@dongjoon-hyun dongjoon-hyun added this to the 2.2.0 milestone Jun 16, 2025
///
/// These values correspond to the 1, 2, ..., 7 component of the WKB integer
/// geometry type (i.e., the value of geometry_type % 1000).
enum class GeometryType {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Java, we had an additional value, -1, for UNKNOWN_TYPE_ID.

private static final int UNKNOWN_TYPE_ID = -1;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++ employs a different implementation that invalidates statistics when unknown types are encountered; therefore, UNKNOWN_TYPE_ID is unnecessary in this context.

}

std::stringstream ss;
ss << "<GeoStatistics>";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto. Why do we have this only in C++ implementation? If this is required, can we have this in Java first?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is due to gap between Parquet Java and C++ implementations...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is due to gap between Parquet Java and C++ implementations..

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few minor comments about the feature parity between Java and C++ implementations.

Copy link
Member

@williamhyun williamhyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thank you @ffacs for this PR, waiting on the above comment before merging.

@dongjoon-hyun
Copy link
Member

Gentle ping, @ffacs .

@ffacs
Copy link
Contributor Author

ffacs commented Jun 24, 2025

Gentle ping, @ffacs .

Pong. Thank you for your review @dongjoon-hyun , I'd update this patch these days.

@dongjoon-hyun
Copy link
Member

Thank you, @ffacs . Also, please participate the 1.8.10 vote too when you have some time.

}

std::stringstream ss;
ss << "<GeoStatistics>";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is due to gap between Parquet Java and C++ implementations...

@dongjoon-hyun
Copy link
Member

Could you fix the above comments, @ffacs ?

@wgtmac
Copy link
Member

wgtmac commented Jul 3, 2025

Let me know when ready to review.

@ffacs
Copy link
Contributor Author

ffacs commented Jul 3, 2025

@wgtmac @dongjoon-hyun I'm ready to review now, please take a look when you're free~

@dongjoon-hyun
Copy link
Member

Thank you, @ffacs .

* This file contains code adapted from the Apache Arrow project.
*
* Original source:
* https://github.com/apache/arrow/blob/main/cpp/src/parquet/geospatial/statistics.h
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide a tag based location instead of the main branch because it changes always.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no tags that contain this patch yet.

* This file contains code adapted from the Apache Arrow project.
*
* Original source:
* https://github.com/apache/arrow/blob/main/cpp/src/parquet/geospatial/statistics.cc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide a tag based location instead of the main branch because it changes always.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

Comment on lines 77 to 78
virtual const std::string& getCRS() const = 0;
virtual geospatial::EIAlgo getEIAlgo() const = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
virtual const std::string& getCRS() const = 0;
virtual geospatial::EIAlgo getEIAlgo() const = 0;
virtual const std::string& getCrs() const = 0;
virtual geospatial::EdgeInterpolationAlgorithm getAlgorithm() const = 0;

It would be good to add comment to say these two functions are for geometry & geography types only.

void WKBGeometryBounder::mergeGeometryInternal(WKBBuffer* src, bool recordWkbType) {
uint8_t endian = src->ReadUInt8();
bool swap = endian != 0x00;
if (isLittleEndian()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can use if constexpr (std::endian::native == std::endian::little)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::endian requires C++20, we are using c++17 now.

@dongjoon-hyun
Copy link
Member

Is this ready or do we need more revision, @ffacs and @wgtmac ?

@wgtmac
Copy link
Member

wgtmac commented Jul 13, 2025

Is this ready or do we need more revision, @ffacs and @wgtmac ?

My only concern is the name of public api: https://github.com/apache/orc/pull/2269/files#r2184135226. It is better to use getCrs() and getAlgorithm()/getEdgeAlgorithm()

@dongjoon-hyun
Copy link
Member

Thank you everyone, @ffacs , @williamhyun , @wgtmac .
Merged to main/2.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants