Skip to content

Conversation

@comphead
Copy link
Contributor

@comphead comphead commented Mar 1, 2024

Which issue does this PR close?

Closes #.
Related to #9055

Rationale for this change

Expanding IGNORE NULLS support for LEAD window function

What changes are included in this PR?

Are these changes tested?

Yes

Are there any user-facing changes?

@github-actions github-actions bot added physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt) labels Mar 1, 2024
@comphead comphead changed the title IGNORE NULLS support for LEAD Implement IGNORE NULLS support for LEAD Mar 1, 2024
@comphead comphead marked this pull request as draft March 1, 2024 22:41
@comphead comphead changed the title Implement IGNORE NULLS support for LEAD Support IGNORE NULLS for LEAD window function Mar 1, 2024
3
24
14
78 50
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only test is failing....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mustafasrepo setting batch size to extreme low value will affect the window range?

@comphead comphead marked this pull request as ready for review March 2, 2024 18:35
@comphead comphead requested a review from mustafasrepo March 2, 2024 18:36
# result should be same with above, when lag algorithm work with pruned data.
# decreasing batch size, causes data to be produced in smaller chunks at the source.
# Hence sliding window algorithm is used during calculations.
# LEAD function to be added, there is some bug with incoming data array when LEAD is called
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll address this in following PR, what I found is the incoming data is incomplete for LEAD.
so for set datafusion.execution.batch_size = 8000;
the incoming array is complete

[datafusion/physical-expr/src/window/lead_lag.rs:280:17] &array = StringArray
[
  "x",
  null,
  "b",
  null,
]

and for set datafusion.execution.batch_size = 1;
the data is partially missed

[datafusion/physical-expr/src/window/lead_lag.rs:280:17] &array = StringArray
[
  "x",
  null,
  "b"
]

Probably the error is in built_in.rs I'll check it in following PR

Copy link
Contributor

@mustafasrepo mustafasrepo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @comphead for this PR. It is LGTM!. Regarding failing test, you mentioned. The reason for its failure is that BoundedWindowExecutor uses get_range method to calculate required buffer for the result calculation. For most of the window function that uses window frame, get_range is not used. Because required range can be determined from window frame. However, for some builtin window functions window frame is not used such as LEAD, LAG, ROW_NUMBER. For these ones get_range should be implemented to make sure that required buffer is always present for the result. I think bug is related to the get_range implementation for the LEAD. I will file a subsequent PR to solve this problem.

@mustafasrepo mustafasrepo merged commit dd1cf01 into apache:main Mar 4, 2024
wiedld pushed a commit to wiedld/arrow-datafusion that referenced this pull request Mar 21, 2024
* IGNORE NULLS support for LEAD

* fix

* fix

* fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants