Skip to content

concat_batches panics with total_len <= bit_len assertion for records with lists #4324

@joshg-ec

Description

@joshg-ec

Describe the bug
concat, used by concat_batches, does not appear to allocate sufficient capacities when constructing the MutableArrayData. Concatenating records that contain lists of structs results in the following panic:

assertion failed: total_len <= bit_len
thread 'concat_test' panicked at 'assertion failed: total_len <= bit_len', /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-buffer-40.0.0/src/buffer/boolean.rs:55:9
stack backtrace:
   0: rust_begin_unwind
             at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:579:5
   1: core::panicking::panic_fmt
             at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/panicking.rs:64:14
   2: core::panicking::panic
             at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/panicking.rs:114:5
   3: arrow_buffer::buffer::boolean::BooleanBuffer::new
             at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-buffer-40.0.0/src/buffer/boolean.rs:55:9
   4: arrow_data::transform::_MutableArrayData::freeze::{{closure}}
             at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:81:25
   5: core::bool::<impl bool>::then
             at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/bool.rs:71:24
   6: arrow_data::transform::_MutableArrayData::freeze
             at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:80:21
   7: arrow_data::transform::MutableArrayData::freeze
             at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:656:18
   8: arrow_data::transform::_MutableArrayData::freeze
             at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:74:37
   9: arrow_data::transform::MutableArrayData::freeze
             at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:656:18
  10: arrow_data::transform::_MutableArrayData::freeze
             at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:74:37
  11: arrow_data::transform::MutableArrayData::freeze
             at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:656:18
  12: arrow_data::transform::_MutableArrayData::freeze
             at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:74:37
  13: arrow_data::transform::MutableArrayData::freeze
             at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:656:18

To Reproduce
Call concat_batches with RecordBatchs that contain lists of structs (on the order of 20–50 structs in the list per RecordBatch). If I modify the capacity calculation in concat to add a constant factor for lists, the error does not occur:

    let capacity = match d {
        DataType::Utf8 => binary_capacity::<Utf8Type>(arrays),
        DataType::LargeUtf8 => binary_capacity::<LargeUtf8Type>(arrays),
        DataType::Binary => binary_capacity::<BinaryType>(arrays),
        DataType::LargeBinary => binary_capacity::<LargeBinaryType>(arrays),
        DataType::List(_) => {
            Capacities::Array(arrays.iter().map(|a| a.len()).sum::<usize>() + 500) // <- 500 added here
        }
        _ => Capacities::Array(arrays.iter().map(|a| a.len()).sum()),
    };

Expected behavior
No panics when concatenating RecordBatchs with lists.

Additional context
Reproduced with Arrow versions 37–40.

Metadata

Metadata

Assignees

Labels

arrowChanges to the arrow cratebug

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions