-
Notifications
You must be signed in to change notification settings - Fork 155
feat: support shard predicate #604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SAME predicate is primarily used to assert whether entities in a path are same. Is the EXISTS syntax implemented in this pr?
Here is a syntax definition for reference.
<same predicate> ::= SAME <left paren> <element variable reference> <comma> <element variable reference> [ { <comma> <element variable reference> }... ] <right paren>
| * SQL node representing a same predicate pattern in GQL. | ||
| * This node represents a pattern where two path patterns share a common predicate condition. | ||
| * | ||
| * <p>Example: MATCH (a:person) -> (b) | (a:person) -> (c) WHERE SAME(a.age > 25) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SAME predicate is primarily used to assert whether entities in a path are same. Is the EXISTS syntax implemented here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may misunderstand that this feature is used to check whether entities are the same, but in fact, this feature is used to share conditions between multiple path patterns
| * | ||
| * @return true if distinct, false if union all | ||
| */ | ||
| public boolean isDistinct() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The EXISTS sql node doesn't need distinct and union, the copying here is redundant.
|
|
||
| // Unparse union operator | ||
| if (isDistinct) { | ||
| writer.print(" | "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here make the corresponding changes.
| * MatchSamePredicate(left, right, condition, distinct) -> | ||
| * MatchFilter(MatchUnion(left, right, distinct), condition) | ||
| */ | ||
| public class SamePredicateOptimizationRule extends RelOptRule { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this optimization rule mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimization Rule Detailed Explanation
1. Core Function of the Optimization Rule
This optimization rule is used to convert the SAME predicate pattern into a more efficient execution plan.
Conversion Process:
MatchSamePredicate(left, right, condition, distinct)
↓ (Optimized)
MatchFilter(MatchUnion(left, right, distinct), condition)
2. **Why is this optimization necessary? **
Issues before optimization:
MatchSamePredicateis a special operator, requiring special execution logic.- The executor needs to understand the semantics of the SAME predicate, resulting in high implementation complexity.
- Difficulty leveraging existing optimization rules (such as predicate pushdown and index optimization).
Benefits after optimization:
- Converted to a standard Union + Filter combination, simplifying execution logic.
- Can leverage existing optimization rules and indexes.
- More standardized execution plans, facilitating further optimization.
3. Specific Conversion Example
Original SQL:
MATCH (a:person) -> (b) | (a:person) -> (c) WHERE SAME(a.age > 25)
RETURN a.id, b.id, c.idExecution Plan before optimization:
MATCHSamePredicate(
left: (a:person) -> (c) (b),
right: (a:person) -> (c),
condition: a.age > 25,
distinct: true
)
Optimized Execution Plan:
MatchFilter(
condition: a.age > 25,
input: MatchUnion(
inputs: [(a:person) -> (b), (a:person) -> (c)],
all: false // distinct = true
)
)
4. Practical Application Examples
Scenario 1: Basic Condition Sharing
-- Find people older than 25 who are both friends and colleagues
MATCH (a:person) -> (b:friend) | (a:person) -> (c:colleague) WHERE SAME(a.age > 25)Before Optimization: Requires special handling of the SAME predicate
After Optimization:
- First, union the two paths:
(a:person) -> (b:friend)∪(a:person) -> (c:colleague) - Then filter:
a.age > 25
Scenario 2: Sharing Complex Conditions
-- Finding People Meeting Complex Conditions
MATCH (a:person) -> (b) | (a:person) -> (c) WHERE SAME(a.age > 25 AND a.name = 'marko')After:
- Union:
(a:person) -> (b)∪(a:person) -> (c) - Filter:
a.age > 25 AND a.name = 'marko'
Scenario 3: DISTINCT Semantics
-- Using DISTINCT Semantics
MATCH (a:person) -> (b) | (a:person) -> (c) WHERE SAME(a.age > 25 AND a.name = 'marko') (b) | (a:person) -> (c) WHERE SAME(a.age > 25) DISTINCTAfter optimization:
- Union with DISTINCT:
(a:person) -> (b)∪(a:person) -> (c)(de-duplication) - Filter:
a.age > 25
5. Specific Benefits of the Optimization
Performance Optimization:
- Predicate Pushdown: Conditions can be pushed down before the union, reducing the amount of data.
- Index Utilization: Existing indexes can be leveraged to accelerate condition filtering.
- Parallel Execution: Union operations can execute two paths in parallel.
Code Simplification:
- Executor Simplification: No special SAME predicate execution logic is required.
- Maintainability Improvement: Standard union and filter operators are used.
- Test Simplification: Existing union and filter test cases can be reused.
6. Optimization Rule Triggering Conditions
// This rule only applies to nodes of the MatchSamePredicate type.
super(operand(MatchSamePredicate.class, any()));Triggering Time:
- Query optimization phase
- When the execution plan contains a
MatchSamePredicatenode - Automatically applied, no manual intervention required
7. Cooperation with Other Optimization Rules
MatchFilterMergeRule:
// Can further merge multiple filters.
MatchFilter(MatchFilter(input, condition1), condition2)
↓
MatchFilter(input, condition1 AND condition2)Predicate Pushdown Rule:
- Can push filter conditions further down to the data source.
- Reduces the amount of intermediate results.
8. Actual Execution Performance Comparison
Before optimization:
Execution plan: MatchSamePredicate
- Requires special execution logic
- Difficult to leverage indexes
- Complex execution path
After optimization:
Execution plan: MatchFilter -> MatchUnion
- Uses standard operators
- Can leverage index optimization
- Clear execution path
- Supports further optimization
This optimization rule embodies the core concept of the Query Optimizer: transforming complex semantics into simple, optimizable combinations of standard operators.
Usage: Check if a match exists for a certain path Grammar Format: Actual SQL Example: Test Case Proof : The project already has a complete EXISTS test case
Purpose : Share the same conditions among multiple path patterns Grammar Format: Actual SQL Example: |
I personally think what you said makes some sense. But this is rarely needed in practice because the entities in the path are usually different You can see the following two examples Expected SAME (Entity Comparison) by yours: Actual implementation of SAME (Conditional Sharing): Exits (path existence): |
What changes were proposed in this pull request?
Related to issue-368
How was this PR tested?