-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
When using FFI Table Providers, we generate an error because the input schemas do not match for cases like count
where the input schema is irrelevant. See the minimal reproducible example below.
To Reproduce
use datafusion::arrow::{datatypes::SchemaRef, record_batch::RecordBatch};
use datafusion::{
catalog::streaming::StreamingTable,
error::Result,
physical_plan::{
stream::RecordBatchStreamAdapter, streaming::PartitionStream,
SendableRecordBatchStream,
},
prelude::*,
};
use datafusion_ffi::table_provider::{FFI_TableProvider, ForeignTableProvider};
use std::sync::Arc;
fn create_partition() -> RecordBatch {
datafusion::common::record_batch!(
("a", Int32, vec![1, 2, 3]),
("b", Float64, vec![Some(4.0), None, Some(5.0)]),
("c", Utf8, vec!["alpha", "beta", "gamma"])
)
.unwrap()
}
#[derive(Debug)]
struct CustomStream {
schema: SchemaRef,
}
impl Default for CustomStream {
fn default() -> Self {
Self {
schema: create_partition().schema(),
}
}
}
impl PartitionStream for CustomStream {
fn schema(&self) -> &SchemaRef {
&self.schema
}
fn execute(
&self,
_ctx: Arc<datafusion::execution::TaskContext>,
) -> SendableRecordBatchStream {
let stream_iter = [0..3].into_iter().map(|_| Ok(create_partition()));
let stream = futures::stream::iter(stream_iter);
let adapter = RecordBatchStreamAdapter::new(Arc::clone(&self.schema), stream);
Box::pin(adapter)
}
}
#[tokio::main]
async fn main() -> Result<()> {
let ctx = SessionContext::new();
let partition_stream = Arc::new(CustomStream::default());
let table = Arc::new(StreamingTable::try_new(
create_partition().schema(),
vec![partition_stream],
)?);
let ffi_table = FFI_TableProvider::new(table, false, None);
let foreign_table: ForeignTableProvider = (&ffi_table).into();
ctx.register_table("my-table", Arc::new(foreign_table))?;
ctx.table("my-table").await?.show().await?;
let num_rows = ctx.table("my-table").await?.count().await?;
println!("Found {num_rows} rows.");
Ok(())
}
The table does work, but count fails.
+---+-----+-------+
| a | b | c |
+---+-----+-------+
| 1 | 4.0 | alpha |
| 2 | | beta |
| 3 | 5.0 | gamma |
+---+-----+-------+
Error: Internal("Physical input schema should be the same as the one converted from logical input schema. Differences: \n\t- Different number of fields: (physical) 3 vs (logical) 0")
Expected behavior
These should work the same as when not using FFI.
Additional context
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working