-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feat: add support for fixed list wildcard in type signature #9312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
|
arrow cast does not support to casting to wildcard size, after |
I think i actually had this switched around.
|
| if matches!(type_from, FixedSizeList(_, FIXED_SIZE_LIST_WILDCARD)) | ||
| && list_ndims(type_from) == list_ndims(type_into) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps there should also be a condition for checking coerce-ability of the base types here (and maybe a negative test too), something along the lines of:
... && coerced_from(base_type(type_into), base_type(type_from)) == Some(base_type(type_into))Though that should probably be the case for the other two list data types as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there should even be a check for all nested types, which will cover the base case as well (and the ndims check I think?)
FixedSizeList(f_into, FIXED_SIZE_LIST_WILDCARD) => match type_from {
FixedSizeList(f_from, size_from)
if coerced_from(f_into.data_type(), f_from.data_type())
== Some(f_into.data_type().clone()) =>
{
Some(FixedSizeList(f_into.clone(), *size_from))
}
_ => None,
},(with @jayzhan211 fix too)
EDIT: This does not really handle swapping the wildcard size in any nested FixedSizeLists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok got a bit nerd-sniped here it seems, but I think the following actually handles the nested FixedSizeLists too:
FixedSizeList(f_into, FIXED_SIZE_LIST_WILDCARD) => match type_from {
FixedSizeList(f_from, size_from) => {
match coerced_from(f_into.data_type(), f_from.data_type()) {
Some(data_type) if &data_type != f_into.data_type() => {
let new_field =
Arc::new(f_into.as_ref().clone().with_data_type(data_type));
Some(FixedSizeList(new_field, *size_from))
}
Some(_) => Some(FixedSizeList(f_into.clone(), *size_from)),
_ => None,
}
}
_ => None,
},The following test passes (as does the one by @jayzhan211 ):
#[test]
fn test_nested_wildcard_fixed_size_lists() -> Result<()> {
let type_into = DataType::FixedSizeList(
Arc::new(Field::new(
"item",
DataType::FixedSizeList(
Arc::new(Field::new("item", DataType::Int32, false)),
FIXED_SIZE_LIST_WILDCARD,
),
false,
)),
FIXED_SIZE_LIST_WILDCARD,
);
let type_from = DataType::FixedSizeList(
Arc::new(Field::new(
"item",
DataType::FixedSizeList(
Arc::new(Field::new("item", DataType::Int8, false)),
4,
),
false,
)),
3,
);
assert_eq!(
coerced_from(&type_into, &type_from),
Some(DataType::FixedSizeList(
Arc::new(Field::new(
"item",
DataType::FixedSizeList(
Arc::new(Field::new("item", DataType::Int32, false)),
4,
),
false,
)),
3,
))
);
Ok(())
}|
If I understand correctly, you should pass the test like, but I fail the test. let inner = Arc::new(Field::new("item", DataType::Int32, false));
let current_types = vec![
DataType::FixedSizeList(inner.clone(), 2), // able to coerce for any size
];
let signature = Signature::exact(vec![
DataType::FixedSizeList(inner.clone(), FIXED_SIZE_LIST_WILDCARD),
], Volatility::Stable);
let coerced_data_types = data_types(¤t_types, &signature).unwrap();
assert_eq!(coerced_data_types, current_types);the rule is probably this one FixedSizeList(_, FIXED_SIZE_LIST_WILDCARD)
if matches!(type_from, FixedSizeList(_, _))
&& list_ndims(type_from) == list_ndims(type_into) =>
{
Some(type_from.clone())
} |
|
Feel free to ping me if it is ready for review |
|
@jayzhan211 it should be ready for review now. also @gruuya thanks for the suggestions. That snippet seems to get all tests to pass! |
…salmind303/fsl-signature
|
test failures seem unrelated. I get same errors on main. #9467 |
|
One last thing that might be missing is handling the non-wildcard scenario, i.e. when someone might want to specify an specific/explicit size for the |
I just added a test case for that. |
| // make sure it can't coerce to a different size | ||
| let signature = Signature::exact( | ||
| vec![DataType::FixedSizeList(inner.clone(), 3)], | ||
| Volatility::Stable, | ||
| ); | ||
| let coerced_data_types = data_types(¤t_types, &signature); | ||
| assert!(coerced_data_types.is_err()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good assert, though what i had in mind is that it probably wouldn't work for current_types = vec![DataType::FixedSizeList(inner.clone(), 3)], even though in that case it should, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so like making sure the same type works?
let current_types = vec![
DataType::FixedSizeList(inner.clone(), 2), // able to coerce for any size
];
// make sure it works with the same type.
let signature = Signature::exact(
vec![DataType::FixedSizeList(inner.clone(), 2)],
Volatility::Stable,
);
let coerced_data_types = data_types(¤t_types, &signature).unwrap();
assert_eq!(coerced_data_types, current_types);There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly.
|
@gruuya do you think this PR is good to go? |
|
Yup, looks good! |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- thank you @universalmind303 for the contribution and @gruuya for the review
One thing that might be valuable to do is to write some sort of test (perhaps a UDF in https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_scalar_functions.rs) that used this coercion rule and showed it being called with FixedSizedLists of difference sizes.
I was thinking that way the "end to end" desired behavior would also be encoded in a test.
However, I think this PR already has adequate tests, so we can add additional tests as a follow on PR if we want
Which issue does this PR close?
Closes #9139
Rationale for this change
better support for working with fixed size lists is needed
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?