Skip to content

Conversation

comphead
Copy link
Contributor

@comphead comphead commented Oct 14, 2025

Which issue does this PR close?

  • Closes #.

Rationale for this change

Apache Spark concat has some differences comparing to Datafusion, namely:

  • Supports zero arguments
  • Returns NULL if any of inputs are null

What changes are included in this PR?

3-step approach:
- Compute NULL mask from incoming arguments
- Delegate to DataFusion's concat for the actual concatenation
- Apply NULL mask to the result

Are these changes tested?

Are there any user-facing changes?

@comphead comphead added the spark label Oct 14, 2025
@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) catalog Related to the catalog crate common Related to common crate execution Related to the execution crate proto Related to proto crate functions Changes to functions implementation datasource Changes to the datasource crate ffi Changes to the ffi crate physical-plan Changes to the physical-plan crate labels Oct 14, 2025
@github-actions github-actions bot removed sql SQL Planner logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate catalog Related to the catalog crate common Related to common crate execution Related to the execution crate proto Related to proto crate functions Changes to functions implementation datasource Changes to the datasource crate ffi Changes to the ffi crate physical-plan Changes to the physical-plan crate labels Oct 14, 2025
@comphead comphead marked this pull request as ready for review October 15, 2025 17:56
@alamb alamb mentioned this pull request Oct 15, 2025
16 tasks
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thanks @comphead

I suspect we can make this function faster but it can be done as a follow on PR

let arrays = arrays?;

// Compute NULL mask
let mut null_mask = vec![false; array_len];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably compute the null mask more efficiently using a NullBullfer::union: https://docs.rs/arrow/latest/arrow/buffer/struct.NullBuffer.html#method.union

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alamb I'm planning to merge it to hop into the 50.3.0 release and then create a small follow up on the comment

@comphead comphead added this pull request to the merge queue Oct 15, 2025
Merged via the queue into apache:main with commit 264030c Oct 15, 2025
28 checks passed
dariocurr pushed a commit to dariocurr/datafusion that referenced this pull request Oct 16, 2025
* chore: Extend backtrace coverage

* fmt

* part2

* feedback

* clippy

* feat: support Spark `concat`

* clippy

* comments

* test

* doc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

spark sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants