Skip to content

Round trip encoding of list of fixed list fails when offset is not zero #7315

@timsaucer

Description

@timsaucer

Describe the bug

If you have a List of FixedList and the top level list has an offset that is not zero, the encoded array contains the wrong values in the fixed list. See the example below to illustrate the problem.

To Reproduce

Add the following test within arrow-ipc/src/writer.rs test module:

    #[test]
    fn test_roundtrip_list_of_fixed_list() -> Result<(), ArrowError> {

        let l0_builder = Float32Builder::new();
        let l1_builder = FixedSizeListBuilder::new(l0_builder, 3);
        let mut l2_builder = ListBuilder::new(l1_builder);

        for point in [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]] {
            l2_builder.values().values().append_value(point[0]);
            l2_builder.values().values().append_value(point[1]);
            l2_builder.values().values().append_value(point[2]);

            l2_builder.values().append(true);
        }
        l2_builder.append(true);

        for point in [[10., 11., 12.]] {
            l2_builder.values().values().append_value(point[0]);
            l2_builder.values().values().append_value(point[1]);
            l2_builder.values().values().append_value(point[2]);

            l2_builder.values().append(true);
        }
        l2_builder.append(true);

        let array = Arc::new(l2_builder.finish()) as ArrayRef;

        let schema = Arc::new(Schema::new_with_metadata(
            vec![Field::new(
                "points",
                DataType::List(Arc::new(Field::new(
                    "item",
                    DataType::FixedSizeList(
                        Arc::new(Field::new("item", DataType::Float32, true)),
                        3,
                    ),
                    true,
                ))),
                true,
            )],
            HashMap::default(),
        ));

        let subarray_1 = array.slice(0, 1);
        let subarray_2 = array.slice(1, 1);

        let b1 = RecordBatch::try_new(schema.clone(), vec![subarray_1])?;
        let b2 = RecordBatch::try_new(schema.clone(), vec![subarray_2])?;

        let mut bytes = Vec::new();
        let mut writer = StreamWriter::try_new(&mut bytes, &schema)?;
        writer.write(&b1)?;
        writer.finish()?;

        let mut cursor = std::io::Cursor::new(bytes);
        let mut reader = StreamReader::try_new(&mut cursor, None)?;
        let b1_return = reader.next().unwrap()?;

        assert_eq!(b1, b1_return);

        let mut bytes = Vec::new();
        let mut writer = StreamWriter::try_new(&mut bytes, &schema)?;
        writer.write(&b2)?;
        writer.finish()?;

        let mut cursor = std::io::Cursor::new(bytes);
        let mut reader = StreamReader::try_new(&mut cursor, None)?;
        let b2_return = reader.next().unwrap()?;

        assert_eq!(b2, b2_return);

        Ok(())
    }

Expected behavior

Returned deserialized arrow array should match serialized.

Additional context

This is probably related to #6805

Metadata

Metadata

Assignees

No one assigned

    Labels

    arrowChanges to the arrow cratebug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions