-
Notifications
You must be signed in to change notification settings - Fork 44
feat(spec): Add primitive data types #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
crates/paimon/Cargo.toml
Outdated
| version.workspace = true | ||
|
|
||
| [dependencies] | ||
| chrono = "0.4.38" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been changed in #15.
crates/paimon/src/spec/types.rs
Outdated
| } | ||
|
|
||
| pub trait DataTypeTrait { | ||
| /** Returns whether a value of this type can be {@code null}. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, please use /// for comments.
| #[repr(u8)] | ||
| #[derive(Debug, Clone, Eq, PartialEq, Serialize, Deserialize)] | ||
| #[serde(rename_all("camelCase"))] | ||
| pub enum DataTypeRoot { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we implementing bitset? How about using bitflags instead?
crates/paimon/src/spec/types.rs
Outdated
| type_root: DataTypeRoot, | ||
| } | ||
|
|
||
| pub trait DataTypeTrait { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: We need this trait because we want users to have their own data type?
crates/paimon/src/spec/types.rs
Outdated
| state.end() | ||
| } | ||
|
|
||
| fn equals(&self, other: &Self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rust doesn't implement equals or hash in this way. We need to use Hash and Eq/PartialEq.
| #[derive(Debug, Clone, PartialEq, Eq, Deserialize, Serialize)] | ||
| pub struct DecimalType { | ||
| element_type: DataType, | ||
| precision: u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using rust_decimal: https://docs.rs/rust_decimal/latest/rust_decimal/
|
Thanks @Xuanwo @SteNicholas I will modify it according to the comments later. |
Aitozi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@XuQianJin-Stars Thanks for your contribution, Left some minor comments.
crates/paimon/src/spec/types.rs
Outdated
| } | ||
|
|
||
| /// Returns true if the data type is with the family. | ||
| /// Impl Reference: <https://github.com/apache/paimon/blob/db8bcd7fdd9c2705435d2ab1d2341c52d1f67ee5/paimon-common/src/main/java/org/apache/paimon/types/DataType.java#L214> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line number seems not correct.
crates/paimon/src/spec/types.rs
Outdated
| } | ||
| } | ||
|
|
||
| /// Returns true if the data type is with the family. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect comment
crates/paimon/src/spec/types.rs
Outdated
| .any(|f: &DataTypeFamily| self.is_with_family(f.clone())) | ||
| } | ||
|
|
||
| /// Returns true if the data type is with the family. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect comment
crates/paimon/src/spec/types.rs
Outdated
| self.type_root.families().contains(family) | ||
| } | ||
|
|
||
| /// Returns true if the data type is with the family. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect comment
crates/paimon/src/spec/types.rs
Outdated
| self.copy(self.is_nullable) | ||
| } | ||
|
|
||
| /// Returns true if the data type is with the family. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
crates/paimon/src/spec/types.rs
Outdated
| self.copy(false) | ||
| } | ||
|
|
||
| fn as_sql_string(&self) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method should be moved to the subclass, Right?
crates/paimon/src/spec/types.rs
Outdated
| } | ||
|
|
||
| pub fn as_sql_string(&self) -> String { | ||
| format!("ARRAY<{}>", self.element_type.as_sql_string()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nullability is not displayed in the sql string.
| /// | ||
| /// Impl Reference: <https://github.com/apache/paimon/blob/master/paimon-common/src/main/java/org/apache/paimon/types/DataTypeFamily.java> | ||
| #[derive(Debug, Clone, PartialEq, Eq)] | ||
| pub struct DataTypeFamily: u32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why we use bitset here instead of the original HashSet ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to use a bitset here for easier readability and to avoid unnecessary allocation. DataTypeFamily just a u32 which can store in stack but HashSet needs heap allocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get it, thanks.
| type_root: DataTypeRoot, | ||
| } | ||
|
|
||
| impl Display for DataType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should impl Display for the concrete types such as BigIntType not for the DataType? Otherwise, we can not directly print the subtypes.
Xuanwo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your work. We are on the right track. There are some comments for detailed API design, please take a look.
crates/paimon/src/spec/types.rs
Outdated
| /// The root of data type. | ||
| /// | ||
| /// Impl Reference: <https://github.com/apache/paimon/blob/db8bcd7fdd9c2705435d2ab1d2341c52d1f67ee5/paimon-common/src/main/java/org/apache/paimon/types/DataTypeRoot.java#L49> | ||
| #[derive(Debug, Clone, PartialEq, Eq, Deserialize, Serialize, Hash)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can make DataTypeRoot Copy?
crates/paimon/src/spec/types.rs
Outdated
|
|
||
| /// Returns a deep copy of this type. It requires an implementation of {@link #copy(boolean)}. | ||
| /// Impl Reference: <https://github.com/apache/paimon/blob/db8bcd7fdd9c2705435d2ab1d2341c52d1f67ee5/paimon-common/src/main/java/org/apache/paimon/types/DataType.java#L120> | ||
| fn copy_with_nullable(&self) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is Copy in rust.
crates/paimon/src/spec/types.rs
Outdated
|
|
||
| /// Returns a deep copy of this type with possibly different nullability. | ||
| /// Impl Reference: <https://github.com/apache/paimon/blob/db8bcd7fdd9c2705435d2ab1d2341c52d1f67ee5/paimon-common/src/main/java/org/apache/paimon/types/DataType.java#L113> | ||
| fn copy(&self, is_nullable: bool) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make this API called fn with_nullable(self, is_nullable: bool) -> Self
crates/paimon/src/spec/types.rs
Outdated
|
|
||
| /// Compare two data types without nullable. | ||
| /// Impl Reference: <https://github.com/apache/paimon/blob/db8bcd7fdd9c2705435d2ab1d2341c52d1f67ee5/paimon-common/src/main/java/org/apache/paimon/types/DataType.java#L129> | ||
| fn copy_ignore_nullable(&self) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is duplicated with with_nullable(false)?
crates/paimon/src/spec/types.rs
Outdated
| self.copy(false) | ||
| } | ||
|
|
||
| fn serialize_json(&self) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, we don't need to provide API like this directly. If we have special requirement for Serialize, we need to implement by hand instead of using #[derive(Serialize)]
crates/paimon/src/spec/types.rs
Outdated
| } | ||
| } | ||
|
|
||
| pub fn default_value() -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same.
|
|
||
| impl Display for FloatType { | ||
| fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result { | ||
| if !self.element_type.is_nullable() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same.
| /// | ||
| /// Impl Reference: <https://github.com/apache/paimon/blob/master/paimon-common/src/main/java/org/apache/paimon/types/TimestampType.java>. | ||
| #[derive(Debug, Clone, PartialEq, Eq, Deserialize, Serialize, Hash)] | ||
| pub struct LocalZonedTimestampType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, paimon has a special type for local zone timestamp.
crates/paimon/src/spec/types.rs
Outdated
| self.precision | ||
| ) | ||
| } else { | ||
| write!(f, "TIMESTAMP WITH LOCAL TIME ZONE({})", self.precision) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can implement like:
write!(f, "TIMESTAMP WITH LOCAL TIME ZONE({})", self.precision)?;
if !self.element_type.is_nullable() {
write!(f, " NOT NULL")?;
}
Ok(())
crates/paimon/src/spec/types.rs
Outdated
| Self::new(true, precision) | ||
| } | ||
|
|
||
| pub fn default_value() -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same.
Xuanwo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM to me now. There all only some small nits.
crates/paimon/src/spec/types.rs
Outdated
| /// Returns whether the family type of the type equals to the family or not. | ||
| /// | ||
| /// Impl Reference: <https://github.com/apache/paimon/blob/db8bcd7fdd9c2705435d2ab1d2341c52d1f67ee5/paimon-common/src/main/java/org/apache/paimon/types/DataType.java#L103> | ||
| fn with_family(&self, family: DataTypeFamily) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about naming it is_family?
crates/paimon/src/spec/types.rs
Outdated
|
|
||
| /// Returns whether the root of the type is part of at least one family of the families or not. | ||
| /// Impl Reference: <https://github.com/apache/paimon/blob/db8bcd7fdd9c2705435d2ab1d2341c52d1f67ee5/paimon-common/src/main/java/org/apache/paimon/types/DataType.java#L94> | ||
| fn is_any_with_family(&self, families: &[DataTypeFamily]) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about naming it is_any_of_family to align with is_any_of?
crates/paimon/src/spec/types.rs
Outdated
| fn is_any_with_family(&self, families: &[DataTypeFamily]) -> bool { | ||
| families | ||
| .iter() | ||
| .any(|f: &DataTypeFamily| self.with_family(f.clone())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataTypeFamily is Copy, so I'm guessing we don't need to clone it.
Have you tried:
families
.iter()
.any(self.is_family)
crates/paimon/src/spec/types.rs
Outdated
|
|
||
| /// Returns a deep copy of this type with possibly different nullability. | ||
| /// Impl Reference: <https://github.com/apache/paimon/blob/db8bcd7fdd9c2705435d2ab1d2341c52d1f67ee5/paimon-common/src/main/java/org/apache/paimon/types/DataType.java#L113> | ||
| fn with_nullable(&self, is_nullable: bool) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can place construct API in the front to make it easier to find.
Xuanwo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks a lot for your effort!
|
cc @SteNicholas, @Aitozi and @JingsongLi for another look. |
|
I have one more question, Why the DataType is not defined as trait ? If not, how the DataType used in other class. eg: |
Hi, thanks for bringing this up. I don't quite understand the issue you mentioned. Are you trying to say that it's possible for users to implement they own data types and we don't know how to display those types? If so, I believe current implemantion of impl Display for ArrayType {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
if self.element_type.is_nullable() {
write!(f, "ARRAY")
} else {
write!(f, "ARRAY NOT NULL")
}
}
}This is the part of json represent of array. In fact, we need to use |
|
After some offline discussion with @Aitozi, I got his point now. I think we need to build an enum for |
|
An enum |
Aitozi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
This PR will add primitive data types.