Skip to content

Conversation

fang-xing-esql
Copy link
Member

@fang-xing-esql fang-xing-esql commented Sep 30, 2025

This PR enables support for non-correlated subqueries within the FROM command. Related to https://github.com/elastic/esql-planning/issues/89

A non-correlated subquery in this context is one that is fully self-contained and does not reference attributes from the outer query. Enabling support for these subqueries in the FROM command provides an additional way to define a data source, beyond directly specifying index patterns in an ES|QL query.

Example

FROM index1, (FROM index2
              | WHERE a > 10
              | EVAL b = a * 2
              | STATS cnt = COUNT(*) BY c
              | SORT cnt desc
              | LIMIT 10)
, index3, (FROM index4 
           | stats count(*))
| WHERE d > 10
| STATS max = max(*) BY e
| SORT max desc

This feature is built on top of Fork. Subqueries are processed in a manner similar to how Fork operates today, with modifications made to the following components to support this functionality:

  • Grammar: FROM_MODE is updated to support subquery syntax.
  • Parser: LogicalPlanBuilder creates a UnionAll logical plan on top of multiple data sources. Each data source can be either index patterns or subqueries. UnionAll extends Fork, but unlike Fork, each UnionAll leg may fetch data from different indices—this is one of the key differences between UnionAll and Fork.
  • PreAnalyzer: Extracts index patterns from subqueries and issues fieldcaps calls to build an IndexResolution for each subquery.
  • Analyzer: Resolves indices referenced by subqueries and handles union-typed fields referenced within them. Since subquery index patterns and main query index patterns are accessed separately behind each UnionAll leg, InvalidMappedField are not created across them. If conversion functions are required for common fields between the main index and subquery indices, those conversion functions must be pushed down into each UnionAll leg.
  • LogicalPlanOptimizer: Pushes down eligible filters/predicates from the main query into subqueries. This is another key distinction between UnionAll and Fork, as predicate pushdown applies only to UnionAll, while Fork remains unchanged.

Restrictions and follow ups to be addressed in the next PRs:

@elasticsearchmachine
Copy link
Collaborator

Hi @fang-xing-esql, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Hi @fang-xing-esql, I've created a changelog YAML for you.

@fang-xing-esql fang-xing-esql added the test-release Trigger CI checks against release build label Sep 30, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces support for non-correlated subqueries within the FROM command in ES|QL, allowing queries to reference multiple data sources including both index patterns and subqueries. The implementation enables subqueries to be processed similarly to Fork operations, with key distinctions in index resolution and predicate pushdown capabilities.

  • Adds grammar and parser support for subquery syntax in FROM commands
  • Implements UnionAll logical plan to handle mixed index patterns and subqueries
  • Enables predicate pushdown optimization specifically for UnionAll operations

Reviewed Changes

Copilot reviewed 36 out of 39 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
EsqlBaseParser.g4 Updates grammar to support subquery syntax in FROM_MODE
LogicalPlanBuilder.java Creates UnionAll plans and handles subquery/index pattern combinations
UnionAll.java New logical plan extending Fork with union-typed field support
Subquery.java New logical plan node representing subquery placeholders
Analyzer.java Resolves subquery indices and handles union-typed fields
PushDownAndCombineFilters.java Adds predicate pushdown optimization for UnionAll
EsqlSession.java Implements subquery index resolution during pre-analysis
Various test files Adds comprehensive test coverage for subquery functionality
Comments suppressed due to low confidence (1)

x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/parser/SubqueryTests.java:1

  • There's a typo in "nested fork/subquery is not supported, it passes Analyzer" - should be "nested fork/subquery is not supported; it passes Analyzer" (semicolon instead of comma for better grammar).
/*

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

}
return parent;
} else { // We should not reach here as the grammar does not allow it
throw new ParsingException("FROM is required in a subquery");
Copy link
Preview

Copilot AI Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message "FROM is required in a subquery" is misleading since the grammar already enforces this requirement. Consider a more descriptive message like "Invalid subquery structure" or remove the comment and exception if this code path is truly unreachable.

Suggested change
throw new ParsingException("FROM is required in a subquery");
throw new ParsingException("Invalid subquery structure");

Copilot uses AI. Check for mistakes.

LogicalPlan newChild = switch (child) {
case Project project -> maybePushDownFilterPastProjectForUnionAllChild(pushable, project);
case Limit limit -> maybePushDownFilterPastLimitForUnionAllChild(pushable, limit);
default -> null; // TODO add a general push down for unexpected pattern
Copy link
Preview

Copilot AI Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO comment indicates incomplete functionality. Consider implementing the general push down logic or at least provide a more specific plan for when this will be addressed, as returning null could lead to silent failures in optimization.

Suggested change
default -> null; // TODO add a general push down for unexpected pattern
default -> {
// Fallback: unknown child type, do not push down filter for this child.
// Consider implementing general push down logic here in the future.
yield child;
}

Copilot uses AI. Check for mistakes.

boolean supportsAggregateMetricDouble,
boolean supportsDenseVector
boolean supportsDenseVector,
Set<IndexPattern> subqueryIndices
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging this subqueryIndices into the mainIndices is another option, it will require changes to EsqlCCSUtils.initCrossClusterState and EsqlCCSUtils.createIndexExpressionFromAvailableClusters, as they associate the ExecutionInfo with only one index pattern today.

hasCapabilities(adminClient(), List.of(ENABLE_FORK_FOR_REMOTE_INDICES.capabilityName()))
);
}
// Subqueries in FROM are not fully tested in CCS yet
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When there is subquery exists in the query convertToRemoteIndices doesn't generate a correct remote index pattern yet, the query becomes invalid. Subqueries are not fully tested in CCS yet, working on it as a follow up.

* Simple nested subqueries can be flattened by LogicalPlanBuilder.
* e.g. FROM index1, (FROM index2, (FROM index3, (FROM index4))) ==> FROM index1,index2,index3,index4
*/
public void testSimpleNestedSubquery() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simple nested subqueries can be flattened by LogicalPlanBuilder.

@fang-xing-esql fang-xing-esql marked this pull request as ready for review October 2, 2025 15:16
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Oct 2, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/kibana-esql (ES|QL-ui)

// then the real child, if there is unknown pattern, keep the filter and UnionAll plan unchanged
List<LogicalPlan> newChildren = new ArrayList<>();
boolean changed = false;
for (LogicalPlan child : unionAll.children()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to special handle based on Child type? Just put a filter on top and we already have rules for handling Filter pushdown?

Copy link
Member Author

@fang-xing-esql fang-xing-esql Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason that the children types are checked here is that I'd like to push the predicate closer to an EsRelation, so that the predicate has more chance to be pushed down to lucene. In this PushDownAndCombineFilters rule here, if the child is a limit, filters are not pushed further. However, AddImplicitForkLimit adds a limit to each fork/unionall child, and this limit might prevent us from pushing down the predicate to lucene.

The patterns checked here are what I have seen so far that's added by fork, sometimes the other logical planner rules may eliminate a project, or swap project and limit.

@stratoula
Copy link

@fang-xing-esql just confirming. After the from command the available fields are the field of each index + the results of the subquery. Correct?

assertEquals(expectedPushedFilters, actualPushedFilters);
}

public void testPushDownSimpleFilterPastUnionAll() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add the string representation of the plan in comments above the UT similar to the rest of the UTs in this file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@fang-xing-esql
Copy link
Member Author

@fang-xing-esql just confirming. After the from command the available fields are the field of each index + the results of the subquery. Correct?

Yes, that's correct.

- match: { esql.available: true }
- match: { esql.enabled: true }
- length: { esql.features: 28 }
- length: { esql.features: 29 }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can look at https://github.com/elastic/elasticsearch/pull/134942/files?w=1 for how to add telemetry properly.
You can follow one of the other feature telemetries for this file to add set.
Also add it to FeatureMetric.java and then check for it in VerifierMetricsTests.java and TelemetryIT.java

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

super(source, subqueryPlan);
}

private Subquery(StreamInput in) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we try to have serialization tests for all of our NamedWritables. See LimitSerializationTests for another UnaryPlan with serialization test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Copy link
Contributor

@julian-elastic julian-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks very good! I left a few comments with regards to testing and fixing telemetry. They are all good to have and as your feature is behind snapshot flag they can be addressed in your next PRs if you want to check this one first.

@astefan astefan self-requested a review October 3, 2025 15:53
Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the PR in part and would like to provide some input. I still have to go over the tests and still to understand some parts of the Analyzer.

Thank you for providing the detailed description of the PR. It helps a lot with the review.

FROM sample_data, (FROM employees metadata _id | sort _id) metadata _index | SORT emp_no desc | KEEP _index, emp_no, languages, _id

results in

Found 1 problem\nline 1:50: Unbounded SORT not supported yet [sort _id] please add a LIMIT

This seems to imply that the "default" limit that we usually add to queries is not added to subqueries.
IF this is an acceptable and agreed upon limitation, I think it would help to have it documented in the PR/docs.

FROM (FROM *) metadata _id, _index | SORT emp_no desc | KEEP _index, emp_no, languages, _id

results in

Cannot use field [emp_no] due to ambiguities being mapped as [2] incompatible types: [integer] in [employees], [long] in [employees_incompatible]",

but
FROM *, (FROM *) metadata _id, _index | SORT emp_no desc | KEEP _index, emp_no, languages, _id
doesn't complain. Is the first error valid?

Even FROM * metadata _id, _index | SORT emp_no desc | KEEP _index, emp_no, languages, _id complains.

  1. Apologies if this is already covered, but I wanted to mention this not to forget about it. Since this is also about field_caps calls, using a filter in the request should be something we test for this functionality. As a regular user I would expect that filter to also apply to subqueries, and I think it does.
"query":"FROM *, (FROM * metadata _index) metadata _id, _index | SORT emp_no desc | KEEP _index,  emp_no, _id | stats count=count(*) by _index",
    "filter": {
        "bool": {
            "filter": [
                {
                    "exists": {
                        "field": "emp_no"
                    }
                }
            ]
        }
    }
  1. I am wondering if this behavior is the expected one, because I couldn't tell tbh:

FROM employees, (FROM employees | eval x = emp_no::long), (FROM employees | eval x = emp_no::string) metadata _index | keep x, emp_no, _index
results in column "x" having all values as "null" while if I run
from employees | fork (eval x = emp_no::string) (eval x = emp_no::long) | keep x, emp_no
I get an error message

"Column [x] has conflicting data types in FORK branches: [LONG] and [KEYWORD]"


/**
* Handle union types in UnionAll:
* 1. Push down explicit conversion functions into the UnionAll legs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this better be worded as ... UnionAll branches? As in replace all references to "leg" with "branch". My 2c


public class UnionAll extends Fork implements PostOptimizationPlanVerificationAware {

private final List<Attribute> output;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really needed, if Fork already has this? I don't see anything (obvious) that warrants this.... maybe I'm wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement ES|QL-ui Impacts ES|QL UI Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) test-release Trigger CI checks against release build v9.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants