-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Calling with_column twice generates an error when the second column is a window expression.
df
.with_column("foo", <normal_expr>)
.with_column("bar, <window_expr>)
Because "foo" does not have a qualifier, the second call to with_column ends up aliasing it as well.
datafusion/datafusion/core/src/dataframe/mod.rs
Lines 1466 to 1479 in 3ece7a7
| let mut fields: Vec<Expr> = plan | |
| .schema() | |
| .iter() | |
| .map(|(qualifier, field)| { | |
| if field.name() == name { | |
| col_exists = true; | |
| new_column.clone() | |
| } else if window_func && qualifier.is_none() { | |
| col(Column::from((qualifier, field))).alias(name) | |
| } else { | |
| col(Column::from((qualifier, field))) | |
| } | |
| }) | |
| .collect(); |
Error: Plan("Projections require unique expression names but the expression \"s AS r\" at position 3 and \"row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS r\" at position 4 have the same name. Consider aliasing (\"AS\") one of them.")To Reproduce
Update test_window_function_with_column to first call with_column with any expression.
For example:
// Test issue: https://github.com/apache/datafusion/issues/11982
// Window function was creating unwanted projection when using with_column() method.
#[tokio::test]
async fn test_window_function_with_column() -> Result<()> {
let df = test_table().await?.select_columns(&["c1", "c2", "c3"])?;
let ctx = SessionContext::new();
let df_impl = DataFrame::new(ctx.state(), df.plan.clone());
let func = row_number().alias("row_num");
// This first `with_column` results in a column without a `qualifier`
let df_impl = df_impl.with_column("s", col("c2") + col("c3"))?;
// This second `with_column` then assigns `"r"` alias to the above column and the window function
// Should create an additional column with alias 'r' that has window func results
let df = df_impl.with_column("r", func)?.limit(0, Some(2))?;
assert_eq!(4, df.schema().fields().len());
let df_results = df.clone().collect().await?;
assert_batches_sorted_eq!(
[
"+----+----+-----+---+",
"| c1 | c2 | c3 | r |",
"+----+----+-----+---+",
"| c | 2 | 1 | 1 |",
"| d | 5 | -40 | 2 |",
"+----+----+-----+---+",
],
&df_results
);
Ok(())
}Expected behavior
I would expect the second call to succeed and the final dataframe to have columns c1, c2, c3, s, r
Additional context
#12000 introduced that conditional.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working