-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Fixes esql class cast bug in STATS at planning level #137511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 17 commits
53f9dbd
c66b6b5
5f7e040
ad46b64
f1fd40e
94682b3
bce1940
71c1691
709717c
7c99692
c3cfc7d
15216e2
6981f16
f6e5316
f968149
e1695a0
a1935ea
d4bed3f
c483a52
219c8a8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| pr: 137511 | ||
| summary: Fixes esql class cast bug in STATS at planning level | ||
| area: ES|QL | ||
| type: bug | ||
| issues: [] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,6 +19,7 @@ | |
| import org.elasticsearch.xpack.esql.optimizer.rules.logical.CombineLimitTopN; | ||
| import org.elasticsearch.xpack.esql.optimizer.rules.logical.CombineProjections; | ||
| import org.elasticsearch.xpack.esql.optimizer.rules.logical.ConstantFolding; | ||
| import org.elasticsearch.xpack.esql.optimizer.rules.logical.DeduplicateAggs; | ||
| import org.elasticsearch.xpack.esql.optimizer.rules.logical.ExtractAggregateCommonFilter; | ||
| import org.elasticsearch.xpack.esql.optimizer.rules.logical.FoldNull; | ||
| import org.elasticsearch.xpack.esql.optimizer.rules.logical.HoistRemoteEnrichLimit; | ||
|
|
@@ -178,6 +179,14 @@ protected static Batch<LogicalPlan> operators() { | |
| new SplitInWithFoldableValue(), | ||
| new PropagateEvalFoldables(), | ||
| new ConstantFolding(), | ||
| /* Then deduplicate aggregations | ||
| We need this after the constant folding | ||
| because we could have expressions like | ||
| count_distinct(_, 9 + 1) | ||
| count_distinct(_, 10) | ||
| which are semantically identical | ||
| */ | ||
| new DeduplicateAggs(), | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Interesting! Is the constant folding required to run before? Will this take care of cases that have I'd have thought that maybe the aggs substitution needs to happen before we replace agg expressions with evals. But this is placing deduplication pretty late into the optimization. It's not wrong, though! But maybe a bit unfortunate that we'll have 2 rules that dedupe; and we can't assume that aggs come out of the substitutions batch already deduped. But maybe that was never correct to assume.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Constant folding in aggregation is an endemic problem and tech debt. Some insights here #112392 (comment). I don't think we could realistically do the right thing now with constant folding in aggs (due to priorities), and there are already optimization stuff we duplicate because of this combination of surrogate expressions in aggs + constant folding issues - see being called twice in |
||
| new PartiallyFoldCase(), | ||
| // boolean | ||
| new BooleanSimplification(), | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| /* | ||
| * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
| * or more contributor license agreements. Licensed under the Elastic License | ||
| * 2.0; you may not use this file except in compliance with the Elastic License | ||
| * 2.0. | ||
| */ | ||
|
|
||
| package org.elasticsearch.xpack.esql.optimizer.rules.logical; | ||
|
|
||
| /** | ||
| * This rule handles duplicate aggregate functions to avoid duplicate compute | ||
| * stats a = min(x), b = min(x), c = count(*), d = count() by g | ||
| * becomes | ||
| * stats a = min(x), c = count(*) by g | eval b = a, d = c | keep a, b, c, d, g | ||
| */ | ||
| public final class DeduplicateAggs extends ReplaceAggregateAggExpressionWithEval implements OptimizerRules.CoordinatorOnly { | ||
|
|
||
| public DeduplicateAggs() { | ||
| super(false); | ||
| } | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -43,7 +43,14 @@ | |
| * becomes | ||
| * stats a = min(x), c = count(*) by g | eval b = a, d = c | keep a, b, c, d, g | ||
| */ | ||
| public final class ReplaceAggregateAggExpressionWithEval extends OptimizerRules.OptimizerRule<Aggregate> { | ||
| public class ReplaceAggregateAggExpressionWithEval extends OptimizerRules.OptimizerRule<Aggregate> { | ||
| private boolean replaceNestedExpressions = true; | ||
|
||
|
|
||
| public ReplaceAggregateAggExpressionWithEval(boolean replaceNestedExpressions) { | ||
| super(OptimizerRules.TransformDirection.UP); | ||
| this.replaceNestedExpressions = replaceNestedExpressions; | ||
| } | ||
|
|
||
| public ReplaceAggregateAggExpressionWithEval() { | ||
| super(OptimizerRules.TransformDirection.UP); | ||
| } | ||
|
|
@@ -88,7 +95,7 @@ protected LogicalPlan rule(Aggregate aggregate) { | |
| // common case - handle duplicates | ||
| if (child instanceof AggregateFunction af) { | ||
| // canonical representation, with resolved aliases | ||
| AggregateFunction canonical = (AggregateFunction) af.canonical().transformUp(e -> aliases.resolve(e, e)); | ||
| AggregateFunction canonical = getCannonical(af, aliases); | ||
|
|
||
| Alias found = rootAggs.get(canonical); | ||
| // aggregate is new | ||
|
|
@@ -106,14 +113,15 @@ protected LogicalPlan rule(Aggregate aggregate) { | |
| } | ||
| // nested expression over aggregate function or groups | ||
| // replace them with reference and move the expression into a follow-up eval | ||
| else { | ||
| else if (replaceNestedExpressions) { | ||
| changed.set(true); | ||
| Expression aggExpression = child.transformUp(AggregateFunction.class, af -> { | ||
| AggregateFunction canonical = (AggregateFunction) af.canonical(); | ||
| // canonical representation, with resolved aliases | ||
| AggregateFunction canonical = getCannonical(af, aliases); | ||
| Alias alias = rootAggs.get(canonical); | ||
| if (alias == null) { | ||
| // create synthetic alias ove the found agg function | ||
| alias = new Alias(af.source(), syntheticName(canonical, child, counter[0]++), canonical, null, true); | ||
| // create synthetic alias over the found agg function | ||
| alias = new Alias(af.source(), syntheticName(canonical, child, counter[0]++), af.canonical(), null, true); | ||
| // and remember it to remove duplicates | ||
| rootAggs.put(canonical, alias); | ||
| // add it to the list of aggregates and continue | ||
|
|
@@ -132,6 +140,9 @@ protected LogicalPlan rule(Aggregate aggregate) { | |
| Alias alias = as.replaceChild(aggExpression); | ||
| newEvals.add(alias); | ||
| newProjections.add(alias.toAttribute()); | ||
| } else { | ||
| newAggs.add(agg); | ||
| newProjections.add(agg.toAttribute()); | ||
|
Comment on lines
+144
to
+146
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This block is doing nothing really because it only gets triggered if But I'd rather have this safeguard (just leaving that aggregation as it is) than risking something important to disappear in the plan. |
||
| } | ||
| } | ||
| // not an alias (e.g. grouping field) | ||
|
|
@@ -155,6 +166,10 @@ protected LogicalPlan rule(Aggregate aggregate) { | |
| return plan; | ||
| } | ||
|
|
||
| private static AggregateFunction getCannonical(AggregateFunction af, AttributeMap<Expression> aliases) { | ||
| return (AggregateFunction) af.canonical().transformUp(e -> aliases.resolve(e, e)); | ||
| } | ||
|
|
||
|
Comment on lines
+170
to
+173
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've extracted this as @alex-spies asked me to do |
||
| private static String syntheticName(Expression expression, Expression af, int counter) { | ||
| return TemporaryNameUtils.temporaryName(expression, af, counter); | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize it is a bit tricky to describe the change with java doc.
It might be worth linking an issue as it has a bit detailed description.
There are some prior examples with such links.