Skip to content

AmbiguousReference on project following join when expression name matches left or right column name #17294

@xanderbailey

Description

@xanderbailey

Describe the bug

    let values = vec![vec![lit(1).alias("column1"), lit("hello").alias("column2")]];

    let left = LogicalPlanBuilder::values(values.clone())?
        .alias("left")?
        .build()?;

    let right = LogicalPlanBuilder::values(values)?
        .alias("right")?
        .build()?;

    let join = LogicalPlanBuilder::from(left)
        .join_with_expr_keys(
            right,
            JoinType::Left,
            (vec![col("left.column1")], vec![col("right.column1")]),
            None,
        )?
        .build()?;

    let plan = LogicalPlanBuilder::from(join)
        .project(vec![lit("hello").alias("column1"), col("left.column1")])?
        .build()?;

Fails with:

Error: SchemaError(AmbiguousReference { field: Column { relation: Some(Bare { table: "left" }), name: "column1" } }, Some(""))

This is particularly important when datafusion coverts substrait plans since column names / alias are stripped.

Consider the following case:

Create a null string column before a join and call it "column1", join the dataset and construct a new column in a project which is also a null string column called "column2". The schema after the join has UTF8(NULL) from the left (relation: left) and another UTF(NULL) with no relation.

We could fix that for the substrait case by aliasing literals with a uuid but this could still happen for any expression that returns a default name that doesn't depend on the columns it uses (maybe current timestamp?)

Comes from:

        for (qualifier, name) in qualified_names {
            if unqualified_names.contains(name) {
                return _schema_err!(SchemaError::AmbiguousReference {
                    field: Box::new(Column::new(Some(qualifier.clone()), name))
                });
            }
        }

in dfSchema.rs

To Reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions