-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
let values = vec![vec![lit(1).alias("column1"), lit("hello").alias("column2")]];
let left = LogicalPlanBuilder::values(values.clone())?
.alias("left")?
.build()?;
let right = LogicalPlanBuilder::values(values)?
.alias("right")?
.build()?;
let join = LogicalPlanBuilder::from(left)
.join_with_expr_keys(
right,
JoinType::Left,
(vec![col("left.column1")], vec![col("right.column1")]),
None,
)?
.build()?;
let plan = LogicalPlanBuilder::from(join)
.project(vec![lit("hello").alias("column1"), col("left.column1")])?
.build()?;Fails with:
Error: SchemaError(AmbiguousReference { field: Column { relation: Some(Bare { table: "left" }), name: "column1" } }, Some(""))
This is particularly important when datafusion coverts substrait plans since column names / alias are stripped.
Consider the following case:
Create a null string column before a join and call it "column1", join the dataset and construct a new column in a project which is also a null string column called "column2". The schema after the join has UTF8(NULL) from the left (relation: left) and another UTF(NULL) with no relation.
We could fix that for the substrait case by aliasing literals with a uuid but this could still happen for any expression that returns a default name that doesn't depend on the columns it uses (maybe current timestamp?)
Comes from:
for (qualifier, name) in qualified_names {
if unqualified_names.contains(name) {
return _schema_err!(SchemaError::AmbiguousReference {
field: Box::new(Column::new(Some(qualifier.clone()), name))
});
}
}in dfSchema.rs
To Reproduce
No response
Expected behavior
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working