-
Notifications
You must be signed in to change notification settings - Fork 25.5k
[ES|QL] Non-Correlated Subquery in FROM command #135744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hi @fang-xing-esql, I've created a changelog YAML for you. |
8a72832
to
0c5b79d
Compare
Hi @fang-xing-esql, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces support for non-correlated subqueries within the FROM command in ES|QL, allowing queries to reference multiple data sources including both index patterns and subqueries. The implementation enables subqueries to be processed similarly to Fork operations, with key distinctions in index resolution and predicate pushdown capabilities.
- Adds grammar and parser support for subquery syntax in FROM commands
- Implements UnionAll logical plan to handle mixed index patterns and subqueries
- Enables predicate pushdown optimization specifically for UnionAll operations
Reviewed Changes
Copilot reviewed 36 out of 39 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
EsqlBaseParser.g4 | Updates grammar to support subquery syntax in FROM_MODE |
LogicalPlanBuilder.java | Creates UnionAll plans and handles subquery/index pattern combinations |
UnionAll.java | New logical plan extending Fork with union-typed field support |
Subquery.java | New logical plan node representing subquery placeholders |
Analyzer.java | Resolves subquery indices and handles union-typed fields |
PushDownAndCombineFilters.java | Adds predicate pushdown optimization for UnionAll |
EsqlSession.java | Implements subquery index resolution during pre-analysis |
Various test files | Adds comprehensive test coverage for subquery functionality |
Comments suppressed due to low confidence (1)
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/parser/SubqueryTests.java:1
- There's a typo in "nested fork/subquery is not supported, it passes Analyzer" - should be "nested fork/subquery is not supported; it passes Analyzer" (semicolon instead of comma for better grammar).
/*
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
} | ||
return parent; | ||
} else { // We should not reach here as the grammar does not allow it | ||
throw new ParsingException("FROM is required in a subquery"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message "FROM is required in a subquery" is misleading since the grammar already enforces this requirement. Consider a more descriptive message like "Invalid subquery structure" or remove the comment and exception if this code path is truly unreachable.
throw new ParsingException("FROM is required in a subquery"); | |
throw new ParsingException("Invalid subquery structure"); |
Copilot uses AI. Check for mistakes.
LogicalPlan newChild = switch (child) { | ||
case Project project -> maybePushDownFilterPastProjectForUnionAllChild(pushable, project); | ||
case Limit limit -> maybePushDownFilterPastLimitForUnionAllChild(pushable, limit); | ||
default -> null; // TODO add a general push down for unexpected pattern |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TODO comment indicates incomplete functionality. Consider implementing the general push down logic or at least provide a more specific plan for when this will be addressed, as returning null could lead to silent failures in optimization.
default -> null; // TODO add a general push down for unexpected pattern | |
default -> { | |
// Fallback: unknown child type, do not push down filter for this child. | |
// Consider implementing general push down logic here in the future. | |
yield child; | |
} |
Copilot uses AI. Check for mistakes.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java
Show resolved
Hide resolved
boolean supportsAggregateMetricDouble, | ||
boolean supportsDenseVector | ||
boolean supportsDenseVector, | ||
Set<IndexPattern> subqueryIndices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging this subqueryIndices
into the mainIndices is another option, it will require changes to EsqlCCSUtils.initCrossClusterState
and EsqlCCSUtils.createIndexExpressionFromAvailableClusters
, as they associate the ExecutionInfo
with only one index pattern today.
hasCapabilities(adminClient(), List.of(ENABLE_FORK_FOR_REMOTE_INDICES.capabilityName())) | ||
); | ||
} | ||
// Subqueries in FROM are not fully tested in CCS yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When there is subquery exists in the query convertToRemoteIndices
doesn't generate a correct remote index pattern yet, the query becomes invalid. Subqueries are not fully tested in CCS yet, working on it as a follow up.
* Simple nested subqueries can be flattened by LogicalPlanBuilder. | ||
* e.g. FROM index1, (FROM index2, (FROM index3, (FROM index4))) ==> FROM index1,index2,index3,index4 | ||
*/ | ||
public void testSimpleNestedSubquery() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simple nested subqueries can be flattened by LogicalPlanBuilder
.
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Pinging @elastic/kibana-esql (ES|QL-ui) |
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Show resolved
Hide resolved
// then the real child, if there is unknown pattern, keep the filter and UnionAll plan unchanged | ||
List<LogicalPlan> newChildren = new ArrayList<>(); | ||
boolean changed = false; | ||
for (LogicalPlan child : unionAll.children()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to special handle based on Child type? Just put a filter on top and we already have rules for handling Filter pushdown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason that the children types are checked here is that I'd like to push the predicate closer to an EsRelation
, so that the predicate has more chance to be pushed down to lucene. In this PushDownAndCombineFilters
rule here, if the child is a limit
, filters are not pushed further. However, AddImplicitForkLimit
adds a limit to each fork
/unionall
child, and this limit might prevent us from pushing down the predicate to lucene.
The patterns checked here are what I have seen so far that's added by fork, sometimes the other logical planner rules may eliminate a project
, or swap project
and limit
.
@fang-xing-esql just confirming. After the from command the available fields are the field of each index + the results of the subquery. Correct? |
assertEquals(expectedPushedFilters, actualPushedFilters); | ||
} | ||
|
||
public void testPushDownSimpleFilterPastUnionAll() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add the string representation of the plan in comments above the UT similar to the rest of the UTs in this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
Yes, that's correct. |
- match: { esql.available: true } | ||
- match: { esql.enabled: true } | ||
- length: { esql.features: 28 } | ||
- length: { esql.features: 29 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can look at https://github.com/elastic/elasticsearch/pull/134942/files?w=1 for how to add telemetry properly.
You can follow one of the other feature telemetries for this file to add set.
Also add it to FeatureMetric.java and then check for it in VerifierMetricsTests.java and TelemetryIT.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
super(source, subqueryPlan); | ||
} | ||
|
||
private Subquery(StreamInput in) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we try to have serialization tests for all of our NamedWritables. See LimitSerializationTests for another UnaryPlan with serialization test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks very good! I left a few comments with regards to testing and fixing telemetry. They are all good to have and as your feature is behind snapshot flag they can be addressed in your next PRs if you want to check this one first.
…st in PushDownAndCombineFiltersTests to include the plan strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through the PR in part and would like to provide some input. I still have to go over the tests and still to understand some parts of the Analyzer.
Thank you for providing the detailed description of the PR. It helps a lot with the review.
FROM sample_data, (FROM employees metadata _id | sort _id) metadata _index | SORT emp_no desc | KEEP _index, emp_no, languages, _id
results in
Found 1 problem\nline 1:50: Unbounded SORT not supported yet [sort _id] please add a LIMIT
This seems to imply that the "default" limit that we usually add to queries is not added to subqueries.
IF this is an acceptable and agreed upon limitation, I think it would help to have it documented in the PR/docs.
FROM (FROM *) metadata _id, _index | SORT emp_no desc | KEEP _index, emp_no, languages, _id
results in
Cannot use field [emp_no] due to ambiguities being mapped as [2] incompatible types: [integer] in [employees], [long] in [employees_incompatible]",
but
FROM *, (FROM *) metadata _id, _index | SORT emp_no desc | KEEP _index, emp_no, languages, _id
doesn't complain. Is the first error valid?
Even FROM * metadata _id, _index | SORT emp_no desc | KEEP _index, emp_no, languages, _id
complains.
- Apologies if this is already covered, but I wanted to mention this not to forget about it. Since this is also about
field_caps
calls, using afilter
in the request should be something we test for this functionality. As a regular user I would expect that filter to also apply to subqueries, and I think it does.
"query":"FROM *, (FROM * metadata _index) metadata _id, _index | SORT emp_no desc | KEEP _index, emp_no, _id | stats count=count(*) by _index",
"filter": {
"bool": {
"filter": [
{
"exists": {
"field": "emp_no"
}
}
]
}
}
- I am wondering if this behavior is the expected one, because I couldn't tell tbh:
FROM employees, (FROM employees | eval x = emp_no::long), (FROM employees | eval x = emp_no::string) metadata _index | keep x, emp_no, _index
results in column "x" having all values as "null" while if I run
from employees | fork (eval x = emp_no::string) (eval x = emp_no::long) | keep x, emp_no
I get an error message
"Column [x] has conflicting data types in FORK branches: [LONG] and [KEYWORD]"
|
||
/** | ||
* Handle union types in UnionAll: | ||
* 1. Push down explicit conversion functions into the UnionAll legs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this better be worded as ... UnionAll branches
? As in replace all references to "leg" with "branch". My 2c
|
||
public class UnionAll extends Fork implements PostOptimizationPlanVerificationAware { | ||
|
||
private final List<Attribute> output; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really needed, if Fork already has this? I don't see anything (obvious) that warrants this.... maybe I'm wrong.
This PR enables support for
non-correlated subqueries
within theFROM
command. Related to https://github.com/elastic/esql-planning/issues/89A
non-correlated subquery
in this context is one that is fully self-contained and does not reference attributes from the outer query. Enabling support for these subqueries in theFROM
command provides an additional way to define a data source, beyond directly specifying index patterns in anES|QL
query.Example
This feature is built on top of
Fork
. Subqueries are processed in a manner similar to howFork
operates today, with modifications made to the following components to support this functionality:FROM_MODE
is updated to support subquery syntax.LogicalPlanBuilder
creates aUnionAll
logical plan on top of multiple data sources. Each data source can be either index patterns or subqueries.UnionAll
extendsFork
, but unlikeFork
, eachUnionAll
leg may fetch data from different indices—this is one of the key differences betweenUnionAll
andFork
.fieldcaps
calls to build anIndexResolution
for each subquery.UnionAll
leg,InvalidMappedField
are not created across them. If conversion functions are required for common fields between the main index and subquery indices, those conversion functions must be pushed down into eachUnionAll
leg.UnionAll
andFork
, as predicate pushdown applies only toUnionAll
, whileFork
remains unchanged.Restrictions and follow ups to be addressed in the next PRs:
LogicalPlanOptimizer
will error out, if the subquery has commands besidesFROM
command. This is tracked in [ES|QL] Allow nested non-correlated subqueries in from command #136034.