Skip to content

Conversation

@nik9000
Copy link
Member

@nik9000 nik9000 commented Dec 5, 2024

This reworks Expressions#isNull so it only matches the null literal. This doesn't super change the way ESQL works because we already rewrite things that fold into null into the null literal. It's just that, now, isNull won't return true for things that fold to null - only things that have already folded to null.

This is important because fold can be quite expensive so we're better
off keeping the results of it when possible. Which is what the constant
folding rules do.

This reworks `Expressions#isNull` so it only matches the `null` literal.
This doesn't super change the way ESQL works because we already rewrite
things that `fold` into `null` into the `null` literal. It's just that,
now, `isNull` won't return `true` for things that *fold* to null - only
things that have *already* folded to null.

This is important because `fold` can be quite expensive so we're better
off keeping the results of it when possible. Which is what the constant
folding rules *do*.
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Dec 5, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@nik9000
Copy link
Member Author

nik9000 commented Dec 6, 2024

So my statement:

This doesn't super change the way ESQL works because we already rewrite things that fold into null into the null literal.

Is how I believe we work. But I think folks who own various bits of code should double check me.

@astefan astefan self-requested a review December 6, 2024 14:29
// P.S. this could be done inside the Aggregate but this place better centralizes the logic
if (e instanceof AggregateFunction agg) {
if (Expressions.isNull(agg.filter())) {
if (Expressions.isGuaranteedNull(agg.filter())) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure the changes to this file are fine because, if we don't fold to null immediately, we will fold constants, which will yield literal null, which will then fold null here.

@Override
public Object fold() {
if (Expressions.isNull(value) || list.stream().allMatch(Expressions::isNull)) {
if (Expressions.isGuaranteedNull(value) || list.stream().allMatch(Expressions::isGuaranteedNull)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about this one. I feel like it's tricky to be sure this is right.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to try removing this line from ESQL entirely to see what happens.... Learning!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one test fails if I zap this - testFoldNullListInToLocalRelation - which I don't really have the background to know. I don't think we should remove this, but I want to make sure this is still running properly in this way.

return filter.child();
}
if (FALSE.equals(condition) || Expressions.isNull(condition)) {
if (FALSE.equals(condition) || Expressions.isGuaranteedNull(condition)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure these are fine because we'll have rewritten all null valued expressions into literal nulls by the time we make it here. But I'd love is someone could confirm that we have a test for the plan bits.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And that someone could be me....

@astefan
Copy link
Contributor

astefan commented Dec 6, 2024

I am a bit surprised that the PR passes :-))).

I'll have a deeper look, though. There are many usecases that we do not sort-of support now that are related to aggregations - see #112392. That PR is a daring attempt at fixing the missing bits, but aggregations folding (as a concept) needs the happen otherwise that PR introduces way too many workarounds just to make some use cases work. I will close it and create a sort of meta issue for constant folding instead.

Leaving that PR aside, the change here with isGuaranteedNull expects the method to be used at the "right" time in the planning process where, as you say, things are already folded to a Literal. I need to take some example for a spin and try to break flow.

@nik9000
Copy link
Member Author

nik9000 commented Dec 6, 2024

expects the method to be used at the "right" time

Yeah! I hate that. But I think the alternative is throwing away a bunch of work after folding. Sometimes allocating a lot of stuff.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've given some more thought to this. Conceptually, isNull was doing too much I think by actually temporarily folding the value and checking it. Whereas the operation of the expression being folded (and actually being replaced as null in the entire tree) was happening elsewhere (Literal,of(e) via ConstantFolding or right here in FoldNull).

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering what prompted the change. Repeatedly folding (and throwing away the results) is wasteful, but not that many rules get to run before isNull that do fold.

But at this scale, maybe placing ConstantFolding (and maybe PartiallyFoldCase) ahead of FoldNull in the optimiser would also make sense -- maybe push these three to be the first rules?

* into a {@link Literal} containing {@code null} which will return
* {@code true} from here.
*/
public static boolean isGuaranteedNull(Expression e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: isGuaranteedNull makes me wonder what is "null" and more relevant when is it not guaranteed :)
I'd find something like isNullTypeOrLiteral clearer, but just a nitty nit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I think I'm going to keep it because everything feels not so great. Maybe valueIsGuaranteedNull or something? I'm not sure that makes it more descriptive.

@nik9000 nik9000 added the auto-backport Automatically create backport pull requests when merged label Dec 6, 2024
@nik9000
Copy link
Member Author

nik9000 commented Dec 6, 2024

Seeing approvals, I'm going to merge on this so I can build on it. If anyone can come up with any fun extra test cases, let's make followup. And

But at this scale, maybe placing ConstantFolding (and maybe PartiallyFoldCase) ahead of FoldNull in the optimiser would also make sense -- maybe push these three to be the first rules?

I think that's a great idea! Should I just try it myself?

@nik9000 nik9000 merged commit 8c38007 into elastic:main Dec 6, 2024
16 checks passed
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Dec 6, 2024
This reworks `Expressions#isNull` so it only matches the `null` literal.
This doesn't super change the way ESQL works because we already rewrite
things that `fold` into `null` into the `null` literal. It's just that,
now, `isNull` won't return `true` for things that *fold* to null - only
things that have *already* folded to null.

This is important because `fold` can be quite expensive so we're better
off keeping the results of it when possible. Which is what the constant
folding rules *do*.
elasticsearchmachine pushed a commit that referenced this pull request Dec 6, 2024
This reworks `Expressions#isNull` so it only matches the `null` literal.
This doesn't super change the way ESQL works because we already rewrite
things that `fold` into `null` into the `null` literal. It's just that,
now, `isNull` won't return `true` for things that *fold* to null - only
things that have *already* folded to null.

This is important because `fold` can be quite expensive so we're better
off keeping the results of it when possible. Which is what the constant
folding rules *do*.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/Compute Engine Analytics in ES|QL auto-backport Automatically create backport pull requests when merged >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.18.0 v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants