Skip to content

Incorrect Behavior of Collecting a filtered iterator to a BooleanArray #8505

@tobixdev

Description

@tobixdev

Describe the bug

Collecting to a BooleanArray produces unintuitive results if the upper bound of the iterator is an over estimation. At least I think after looking at the code.

Is this intended behavior? If not, I could try to come up with a fix.

To Reproduce

Tested with Arrow v56.2.0 (via DataFusion 50)

The following test reproduces this:

    #[test]
    fn test_boolean_array_from() {
        let values = vec![Some(true), None, Some(true), Some(false)]
            .into_iter()
            .filter(Option::is_some)
            .collect::<BooleanArray>();
        assert_debug_snapshot!(values, @r"
        BooleanArray
        [
          true,
          true,
          false,
          null,
        ]
        ")
    }

Expected behavior

I'd have expected the following Array (without the null):

        BooleanArray
        [
          true,
          true,
          false,
        ]

Additional context

The result of the "same" operation on an Int64Array:

    #[test]
    fn test_int64_array_from() {
        let values = vec![Some(1), None, Some(2), Some(3)]
            .into_iter()
            .filter(Option::is_some)
            .collect::<Int64Array>();
        assert_debug_snapshot!(values, @r"
        PrimitiveArray<Int64>
        [
          1,
          2,
          3,
        ]
        ")
    }

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions