-
Notifications
You must be signed in to change notification settings - Fork 196
[755] Support column stats for paimon #767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
xtable-core/src/main/java/org/apache/xtable/paimon/PaimonDataFileExtractor.java
Outdated
Show resolved
Hide resolved
| List<String> colNames = file.valueStatsCols(); | ||
| // log.info("valueStatsCols: {}", colNames); | ||
| if (colNames == null || colNames.isEmpty()) { | ||
| // if column names are not present, we assume all columns in the schema are present in the same order as the schema - TODO: validate this assumption |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q to Paimon experts: Is this assumption valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mao-liu are you in contact with anyone on the Paimon side to get these questions answered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikedias do you know the answers to any of these Paimon questions on this PR by any chance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I don't have these answers...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @the-other-tim-brown , apologies I haven't been very active on this PR until this week.
I have just emailed the Paimon user group about these questions, and hoping to hear back soon.
We have been busy test-driving this change, and happy to report it's working well thus far!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xtable-core/src/main/java/org/apache/xtable/paimon/PaimonDataFileExtractor.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/paimon/PaimonDataFileExtractor.java
Outdated
Show resolved
Hide resolved
c0cabf3 to
2757735
Compare
2757735 to
6e927e2
Compare
| // TODO: Implement logic to extract column stats from the file meta | ||
| // https://github.com/apache/incubator-xtable/issues/755 | ||
| return Collections.emptyList(); | ||
| private List<ColumnStat> toColumnStats(DataFileMeta file, InternalSchema internalSchema) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move the column stats conversion to its own class? In the future if we add Paimon as a target then we will also need to convert to the Paimon representation and it would be nice to have all this stats logic in its own class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do, thanks for the early review @the-other-tim-brown !
I do wonder though, if it is even possible to have Paimon as a target... Paimon has a pretty unique file layout, and might not be as easily "tricked" as other formats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know enough about Paimon to say. Hudi also has a unique native layout structure to allow for update heavy workloads though and we were able to make this work.
Mainly we do this separation to keep the logic isolated though. As not necessarily relevant to Paimon, but if a table format changes how they represent stats in a new version, we can plug in the appropriate converter based on the version.
xtable-core/src/main/java/org/apache/xtable/paimon/PaimonDataFileExtractor.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/paimon/PaimonStatsExtractor.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/paimon/PaimonStatsExtractor.java
Outdated
Show resolved
Hide resolved
|
Hey @the-other-tim-brown , we have validated the assumptions in this PR with responses from Paimon maintainers - no more TODOs from me, ready for your review now :) |
Important Read
Closes #755
What is the purpose of the pull request
Adds support for Paimon metadata stats
Brief change log
Verify this pull request
This change added tests and can be verified as follows:
Dev Notes