Skip to content

Conversation

@universalmind303
Copy link
Contributor

Which issue does this PR close?

Closes #9139

Rationale for this change

better support for working with fixed size lists is needed

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Feb 21, 2024
@jayzhan211

This comment was marked as outdated.

@jayzhan211
Copy link
Contributor

jayzhan211 commented Feb 23, 2024

arrow cast does not support to casting to wildcard size, after coerce_from the original size info is lost, how do you process the wildcard size instead

@universalmind303
Copy link
Contributor Author

arrow cast does not support to casting to wildcard size, after coerce_from the original size info is lost, how do you process the wildcard size instead

I think i actually had this switched around.

FixedSizeList(_, FIXED_SIZE_LIST_WILDCARD)) should coerce to FixedSizeList(_, size)) not the other way around.

Comment on lines 386 to 387
if matches!(type_from, FixedSizeList(_, FIXED_SIZE_LIST_WILDCARD))
&& list_ndims(type_from) == list_ndims(type_into) =>
Copy link
Contributor

@gruuya gruuya Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps there should also be a condition for checking coerce-ability of the base types here (and maybe a negative test too), something along the lines of:

... && coerced_from(base_type(type_into), base_type(type_from)) == Some(base_type(type_into))

Though that should probably be the case for the other two list data types as well.

Copy link
Contributor

@gruuya gruuya Feb 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there should even be a check for all nested types, which will cover the base case as well (and the ndims check I think?)

FixedSizeList(f_into, FIXED_SIZE_LIST_WILDCARD) => match type_from {
    FixedSizeList(f_from, size_from)
        if coerced_from(f_into.data_type(), f_from.data_type())
            == Some(f_into.data_type().clone()) =>
    {
        Some(FixedSizeList(f_into.clone(), *size_from))
    }
    _ => None,
},

(with @jayzhan211 fix too)

EDIT: This does not really handle swapping the wildcard size in any nested FixedSizeLists

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok got a bit nerd-sniped here it seems, but I think the following actually handles the nested FixedSizeLists too:

FixedSizeList(f_into, FIXED_SIZE_LIST_WILDCARD) => match type_from {
    FixedSizeList(f_from, size_from) => {
        match coerced_from(f_into.data_type(), f_from.data_type()) {
            Some(data_type) if &data_type != f_into.data_type() => {
                let new_field =
                    Arc::new(f_into.as_ref().clone().with_data_type(data_type));
                Some(FixedSizeList(new_field, *size_from))
            }
            Some(_) => Some(FixedSizeList(f_into.clone(), *size_from)),
            _ => None,
        }
    }
    _ => None,
},

The following test passes (as does the one by @jayzhan211 ):

#[test]
fn test_nested_wildcard_fixed_size_lists() -> Result<()> {
    let type_into = DataType::FixedSizeList(
        Arc::new(Field::new(
            "item",
            DataType::FixedSizeList(
                Arc::new(Field::new("item", DataType::Int32, false)),
                FIXED_SIZE_LIST_WILDCARD,
            ),
            false,
        )),
        FIXED_SIZE_LIST_WILDCARD,
    );

    let type_from = DataType::FixedSizeList(
        Arc::new(Field::new(
            "item",
            DataType::FixedSizeList(
                Arc::new(Field::new("item", DataType::Int8, false)),
                4,
            ),
            false,
        )),
        3,
    );

    assert_eq!(
        coerced_from(&type_into, &type_from),
        Some(DataType::FixedSizeList(
            Arc::new(Field::new(
                "item",
                DataType::FixedSizeList(
                    Arc::new(Field::new("item", DataType::Int32, false)),
                    4,
                ),
                false,
            )),
            3,
        ))
    );

    Ok(())
}

@jayzhan211
Copy link
Contributor

jayzhan211 commented Feb 24, 2024

If I understand correctly, you should pass the test like, but I fail the test.

        let inner = Arc::new(Field::new("item", DataType::Int32, false));
        let current_types = vec![
            DataType::FixedSizeList(inner.clone(), 2), // able to coerce for any size
        ];
        let signature = Signature::exact(vec![
            DataType::FixedSizeList(inner.clone(), FIXED_SIZE_LIST_WILDCARD),
        ], Volatility::Stable);
        let coerced_data_types = data_types(&current_types, &signature).unwrap();
        assert_eq!(coerced_data_types, current_types);

the rule is probably this one

        FixedSizeList(_, FIXED_SIZE_LIST_WILDCARD)
            if matches!(type_from, FixedSizeList(_, _))
                && list_ndims(type_from) == list_ndims(type_into) =>
        {
            Some(type_from.clone())
        }

@jayzhan211
Copy link
Contributor

Feel free to ping me if it is ready for review

@jayzhan211 jayzhan211 marked this pull request as draft March 5, 2024 09:03
@universalmind303 universalmind303 marked this pull request as ready for review March 5, 2024 14:32
@universalmind303
Copy link
Contributor Author

@jayzhan211 it should be ready for review now.

also @gruuya thanks for the suggestions. That snippet seems to get all tests to pass!

@universalmind303
Copy link
Contributor Author

universalmind303 commented Mar 5, 2024

test failures seem unrelated. I get same errors on main. #9467

@gruuya
Copy link
Contributor

gruuya commented Mar 5, 2024

One last thing that might be missing is handling the non-wildcard scenario, i.e. when someone might want to specify an specific/explicit size for the FixedSizeList in the signature.

@universalmind303
Copy link
Contributor Author

One last thing that might be missing is handling the non-wildcard scenario, i.e. when someone might want to specify an specific/explicit size for the FixedSizeList in the signature.

I just added a test case for that.

Comment on lines +537 to +543
// make sure it can't coerce to a different size
let signature = Signature::exact(
vec![DataType::FixedSizeList(inner.clone(), 3)],
Volatility::Stable,
);
let coerced_data_types = data_types(&current_types, &signature);
assert!(coerced_data_types.is_err());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good assert, though what i had in mind is that it probably wouldn't work for current_types = vec![DataType::FixedSizeList(inner.clone(), 3)], even though in that case it should, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so like making sure the same type works?

        let current_types = vec![
            DataType::FixedSizeList(inner.clone(), 2), // able to coerce for any size
        ];

        // make sure it works with the same type.
        let signature = Signature::exact(
            vec![DataType::FixedSizeList(inner.clone(), 2)],
            Volatility::Stable,
        );
        let coerced_data_types = data_types(&current_types, &signature).unwrap();
        assert_eq!(coerced_data_types, current_types);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly.

@alamb
Copy link
Contributor

alamb commented Mar 5, 2024

@gruuya do you think this PR is good to go?

@gruuya
Copy link
Contributor

gruuya commented Mar 5, 2024

Yup, looks good!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @universalmind303 for the contribution and @gruuya for the review

One thing that might be valuable to do is to write some sort of test (perhaps a UDF in https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_scalar_functions.rs) that used this coercion rule and showed it being called with FixedSizedLists of difference sizes.

I was thinking that way the "end to end" desired behavior would also be encoded in a test.

However, I think this PR already has adequate tests, so we can add additional tests as a follow on PR if we want

@alamb alamb merged commit 3854419 into apache:main Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Specialized TypeSignature for FixedSizeList

4 participants