-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Closed
Labels
Description
Describe the bug
concat, used by concat_batches, does not appear to allocate sufficient capacities when constructing the MutableArrayData. Concatenating records that contain lists of structs results in the following panic:
assertion failed: total_len <= bit_len
thread 'concat_test' panicked at 'assertion failed: total_len <= bit_len', /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-buffer-40.0.0/src/buffer/boolean.rs:55:9
stack backtrace:
0: rust_begin_unwind
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:579:5
1: core::panicking::panic_fmt
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/panicking.rs:64:14
2: core::panicking::panic
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/panicking.rs:114:5
3: arrow_buffer::buffer::boolean::BooleanBuffer::new
at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-buffer-40.0.0/src/buffer/boolean.rs:55:9
4: arrow_data::transform::_MutableArrayData::freeze::{{closure}}
at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:81:25
5: core::bool::<impl bool>::then
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/bool.rs:71:24
6: arrow_data::transform::_MutableArrayData::freeze
at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:80:21
7: arrow_data::transform::MutableArrayData::freeze
at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:656:18
8: arrow_data::transform::_MutableArrayData::freeze
at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:74:37
9: arrow_data::transform::MutableArrayData::freeze
at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:656:18
10: arrow_data::transform::_MutableArrayData::freeze
at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:74:37
11: arrow_data::transform::MutableArrayData::freeze
at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:656:18
12: arrow_data::transform::_MutableArrayData::freeze
at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:74:37
13: arrow_data::transform::MutableArrayData::freeze
at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:656:18
To Reproduce
Call concat_batches with RecordBatchs that contain lists of structs (on the order of 20–50 structs in the list per RecordBatch). If I modify the capacity calculation in concat to add a constant factor for lists, the error does not occur:
let capacity = match d {
DataType::Utf8 => binary_capacity::<Utf8Type>(arrays),
DataType::LargeUtf8 => binary_capacity::<LargeUtf8Type>(arrays),
DataType::Binary => binary_capacity::<BinaryType>(arrays),
DataType::LargeBinary => binary_capacity::<LargeBinaryType>(arrays),
DataType::List(_) => {
Capacities::Array(arrays.iter().map(|a| a.len()).sum::<usize>() + 500) // <- 500 added here
}
_ => Capacities::Array(arrays.iter().map(|a| a.len()).sum()),
};Expected behavior
No panics when concatenating RecordBatchs with lists.
Additional context
Reproduced with Arrow versions 37–40.
mikeb-btw, smandrell-ec, blake-ec, andyb-ec, nataliepdunn and 3 more