feat: support shard predicate #604

kitalkuyo-gita · 2025-09-02T06:31:34Z

What changes were proposed in this pull request?

How was this PR tested?

Tests have Added for the changes
Production environment verified

Leomrlin

The SAME predicate is primarily used to assert whether entities in a path are same. Is the EXISTS syntax implemented in this pr?

Here is a syntax definition for reference.
<same predicate> ::= SAME <left paren> <element variable reference> <comma> <element variable reference> [ { <comma> <element variable reference> }... ] <right paren>

Leomrlin · 2025-09-04T08:43:28Z

...geaflow-dsl-parser/src/main/java/org/apache/geaflow/dsl/sqlnode/SqlSamePredicatePattern.java

+ * SQL node representing a same predicate pattern in GQL.
+ * This node represents a pattern where two path patterns share a common predicate condition.
+ *
+ * <p>Example: MATCH (a:person) -> (b) | (a:person) -> (c) WHERE SAME(a.age > 25)


The SAME predicate is primarily used to assert whether entities in a path are same. Is the EXISTS syntax implemented here?

You may misunderstand that this feature is used to check whether entities are the same, but in fact, this feature is used to share conditions between multiple path patterns

Leomrlin · 2025-09-04T08:46:06Z

...geaflow-dsl-parser/src/main/java/org/apache/geaflow/dsl/sqlnode/SqlSamePredicatePattern.java

+     *
+     * @return true if distinct, false if union all
+     */
+    public boolean isDistinct() {


The EXISTS sql node doesn't need distinct and union, the copying here is redundant.

Leomrlin · 2025-09-04T08:47:34Z

...geaflow-dsl-parser/src/main/java/org/apache/geaflow/dsl/sqlnode/SqlSamePredicatePattern.java

+
+        // Unparse union operator
+        if (isDistinct) {
+            writer.print(" | ");


Here make the corresponding changes.

Leomrlin · 2025-09-04T08:50:51Z

...l-plan/src/main/java/org/apache/geaflow/dsl/optimize/rule/SamePredicateOptimizationRule.java

+ * MatchSamePredicate(left, right, condition, distinct) ->
+ * MatchFilter(MatchUnion(left, right, distinct), condition)
+ */
+public class SamePredicateOptimizationRule extends RelOptRule {


What does this optimization rule mean?

Optimization Rule Detailed Explanation

1. Core Function of the Optimization Rule

This optimization rule is used to convert the SAME predicate pattern into a more efficient execution plan.

Conversion Process:

MatchSamePredicate(left, right, condition, distinct) ↓ (Optimized) MatchFilter(MatchUnion(left, right, distinct), condition)

2. **Why is this optimization necessary? **

Issues before optimization:

MatchSamePredicate is a special operator, requiring special execution logic.

The executor needs to understand the semantics of the SAME predicate, resulting in high implementation complexity.

Difficulty leveraging existing optimization rules (such as predicate pushdown and index optimization).

Benefits after optimization:

Converted to a standard Union + Filter combination, simplifying execution logic.

Can leverage existing optimization rules and indexes.

More standardized execution plans, facilitating further optimization.

3. Specific Conversion Example

Original SQL:

MATCH (a:person) -> (b) | (a:person) -> (c) WHERE SAME(a.age > 25) RETURN a.id, b.id, c.id

Execution Plan before optimization:

MATCHSamePredicate( left: (a:person) -> (c) (b), right: (a:person) -> (c), condition: a.age > 25, distinct: true )

Optimized Execution Plan:

MatchFilter( condition: a.age > 25, input: MatchUnion( inputs: [(a:person) -> (b), (a:person) -> (c)], all: false // distinct = true ) )

4. Practical Application Examples

Scenario 1: Basic Condition Sharing

-- Find people older than 25 who are both friends and colleagues MATCH (a:person) -> (b:friend) | (a:person) -> (c:colleague) WHERE SAME(a.age > 25)

Before Optimization: Requires special handling of the SAME predicate
After Optimization:

First, union the two paths: (a:person) -> (b:friend) ∪ (a:person) -> (c:colleague)

Then filter: a.age > 25

Scenario 2: Sharing Complex Conditions

-- Finding People Meeting Complex Conditions MATCH (a:person) -> (b) | (a:person) -> (c) WHERE SAME(a.age > 25 AND a.name = 'marko')

After:

Union: (a:person) -> (b) ∪ (a:person) -> (c)

Filter: a.age > 25 AND a.name = 'marko'

Scenario 3: DISTINCT Semantics

-- Using DISTINCT Semantics MATCH (a:person) -> (b) | (a:person) -> (c) WHERE SAME(a.age > 25 AND a.name = 'marko') (b) | (a:person) -> (c) WHERE SAME(a.age > 25) DISTINCT

After optimization:

Union with DISTINCT: (a:person) -> (b) ∪ (a:person) -> (c) (de-duplication)

Filter: a.age > 25

5. Specific Benefits of the Optimization

Performance Optimization:

Predicate Pushdown: Conditions can be pushed down before the union, reducing the amount of data.

Index Utilization: Existing indexes can be leveraged to accelerate condition filtering.

Parallel Execution: Union operations can execute two paths in parallel.

Code Simplification:

Executor Simplification: No special SAME predicate execution logic is required.

Maintainability Improvement: Standard union and filter operators are used.

Test Simplification: Existing union and filter test cases can be reused.

6. Optimization Rule Triggering Conditions

// This rule only applies to nodes of the MatchSamePredicate type. super(operand(MatchSamePredicate.class, any()));

Triggering Time:

Query optimization phase

When the execution plan contains a MatchSamePredicate node

Automatically applied, no manual intervention required

7. Cooperation with Other Optimization Rules

MatchFilterMergeRule:

// Can further merge multiple filters. MatchFilter(MatchFilter(input, condition1), condition2) ↓ MatchFilter(input, condition1 AND condition2)

Predicate Pushdown Rule:

Can push filter conditions further down to the data source.

Reduces the amount of intermediate results.

8. Actual Execution Performance Comparison

Before optimization:

Execution plan: MatchSamePredicate - Requires special execution logic - Difficult to leverage indexes - Complex execution path

After optimization:

Execution plan: MatchFilter -> MatchUnion - Uses standard operators - Can leverage index optimization - Clear execution path - Supports further optimization

This optimization rule embodies the core concept of the Query Optimizer: transforming complex semantics into simple, optimizable combinations of standard operators.

kitalkuyo-gita · 2025-09-05T06:03:27Z

The SAME predicate is primarily used to assert whether entities in a path are same. Is the EXISTS syntax implemented in this pr?

Here is a syntax definition for reference. <same predicate> ::= SAME <left paren> <element variable reference> <comma> <element variable reference> [ { <comma> <element variable reference> }... ] <right paren>

EXISTS Syntax - Fully Implemented Feature

Usage: Check if a match exists for a certain path

Grammar Format:
EXISTS PathPattern

Actual SQL Example:

-- 检查b节点是否有出边连接到c节点
MATCH (a:person WHERE id = 1)-[e]->(b)
WHERE EXISTS (b) -> (c)
RETURN a, e, b

-- 检查b节点是否有入边连接
MATCH (a:person WHERE id = 1)-[e]->(b)  
WHERE EXISTS (b) <- (c:person where id != 1)
RETURN a, e, b

-- 否定存在性检查
MATCH (a:person WHERE id = 1)-[e]->(b)
WHERE NOT EXISTS (b) -> (c)
RETURN a, e, b

Test Case Proof : The project already has a complete EXISTS test case

gql_subquery_005.sql
gql_subquery_006.sql
gql_subquery_007.sql
gql_subquery_009.sql

SAME predicate - New features implemented in this PR

Purpose : Share the same conditions among multiple path patterns

Grammar Format:
MATCH path1 | path2 WHERE SAME(condition)

Actual SQL Example:

-- 基本用法：两个路径都要求a.age > 25
MATCH (a:person) -> (b) | (a:person) -> (c) WHERE SAME(a.age > 25)
RETURN a.id as a_id, a.age as a_age, b.id as b_id, c.id as c_id

-- 支持DISTINCT语义
MATCH (a:person) -> (b) | (a:person) -> (c) WHERE SAME(a.age > 25) DISTINCT
RETURN a.id as a_id, a.age as a_age, b.id as b_id, c.id as c_id

-- 支持多个路径模式
MATCH (a:person) -> (b) | (a:person) -> (c) | (a:person) -> (d) WHERE SAME(a.age > 25)
RETURN a.id as a_id, a.age as a_age, b.id as b_id, c.id as c_id, d.id as d_id

-- 复杂条件：涉及多个变量的条件
MATCH (a:person) -> (b) | (a:person) -> (c) WHERE SAME(a.age > 25 AND b.id != c.id)
RETURN a.id as a_id, a.age as a_age, b.id as b_id, c.id as c_id

kitalkuyo-gita · 2025-09-05T06:12:48Z

The SAME predicate is primarily used to assert whether entities in a path are same. Is the EXISTS syntax implemented in this pr?

Here is a syntax definition for reference. <same predicate> ::= SAME <left paren> <element variable reference> <comma> <element variable reference> [ { <comma> <element variable reference> }... ] <right paren>

I personally think what you said makes some sense. But this is rarely needed in practice because the entities in the path are usually different You can see the following two examples

Expected SAME (Entity Comparison) by yours:

--Check if a, b, and c are the same entity
MATCH (a:person) -> (b) | (a:person) -> (c) WHERE SAME(a, b, c)
--This is rarely needed in practice because the entities in the path are usually different

Actual implementation of SAME (Conditional Sharing):

--Find people over 25 years old who are connected to both b and c at the same time
MATCH (a:person) -> (b) | (a:person) -> (c) WHERE SAME(a.age > 25)
--This is very useful in practice, ensuring that multiple paths meet the same conditions

Exits (path existence):

--Find people with friends
MATCH (a:person) WHERE EXISTS (a) -> (b:person)
--Find people without friends
MATCH (a:person) WHERE NOT EXISTS (a) -> (b:person)

kitalkuyo-gita added 7 commits September 2, 2025 11:48

support same predicate

378abf1

enhance: support same predicate

2855391

enhace: support same predicate

9b9ee6c

add test files

c0d7717

add test case detail

eebfaef

fix checkstyle

933c105

support Operand

50a7bf6

Leomrlin reviewed Sep 4, 2025

View reviewed changes

update introduce

8ae4ca2

kitalkuyo-gita added 5 commits September 9, 2025 14:41

refactor code

c254cd9

change comment

6387ab6

fix checkstyle

4ddc60f

fix checkstyle

59aed1c

Merge remote-tracking branch 'upstream/master' into issue-368

239391b

Leomrlin mentioned this pull request Sep 18, 2025

Proposal: Release Apache GeaFlow's First Post-Incubation Version (0.7.0) #625

Closed

fix checkstyle

fe156c4

kitalkuyo-gita changed the title ~~feat: support same predicate~~ feat: support shard predicate Nov 12, 2025

kitalkuyo-gita requested a review from Leomrlin November 16, 2025 04:21

kitalkuyo-gita added 9 commits November 16, 2025 16:05

bugfix: add shard function

930c6c5

fix checkstyle

bbf29a5

fix checkstyle

ced977b

test: add test files

23f36a1

fix tests

02b3bc8

feat: fix distinct syntax

d5a92de

fix tests

ec7b154

fix tests

3c61f83

fix tests

2a53bd7

kitalkuyo-gita mentioned this pull request Nov 19, 2025

feat: Added implementation of standard ISO-GQL syntax(4)：value type predicate #673

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support shard predicate #604

feat: support shard predicate #604

Uh oh!

kitalkuyo-gita commented Sep 2, 2025

Uh oh!

Leomrlin left a comment •

edited

Loading

Uh oh!

Leomrlin Sep 4, 2025

Uh oh!

kitalkuyo-gita Sep 5, 2025

Uh oh!

Leomrlin Sep 4, 2025

Uh oh!

Leomrlin Sep 4, 2025

Uh oh!

Leomrlin Sep 4, 2025

Uh oh!

kitalkuyo-gita Sep 5, 2025 •

edited

Loading

Uh oh!

kitalkuyo-gita commented Sep 5, 2025

Uh oh!

kitalkuyo-gita commented Sep 5, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: support shard predicate #604

Are you sure you want to change the base?

feat: support shard predicate #604

Uh oh!

Conversation

kitalkuyo-gita commented Sep 2, 2025

What changes were proposed in this pull request?

How was this PR tested?

Uh oh!

Leomrlin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Leomrlin Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

kitalkuyo-gita Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

Leomrlin Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

Leomrlin Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

Leomrlin Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

kitalkuyo-gita Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Optimization Rule Detailed Explanation

1. Core Function of the Optimization Rule

2. **Why is this optimization necessary? **

Issues before optimization:

Benefits after optimization:

3. Specific Conversion Example

Original SQL:

Execution Plan before optimization:

Optimized Execution Plan:

4. Practical Application Examples

Scenario 1: Basic Condition Sharing

Scenario 2: Sharing Complex Conditions

Scenario 3: DISTINCT Semantics

5. Specific Benefits of the Optimization

Performance Optimization:

Code Simplification:

6. Optimization Rule Triggering Conditions

7. Cooperation with Other Optimization Rules

MatchFilterMergeRule:

Predicate Pushdown Rule:

8. Actual Execution Performance Comparison

Before optimization:

After optimization:

Uh oh!

kitalkuyo-gita commented Sep 5, 2025

Uh oh!

kitalkuyo-gita commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Leomrlin left a comment •

edited

Loading

kitalkuyo-gita Sep 5, 2025 •

edited

Loading

2. Why is this optimization necessary?

kitalkuyo-gita commented Sep 5, 2025 •

edited

Loading