Skip to content

Conversation

@KKould
Copy link
Member

@KKould KKould commented Mar 16, 2024

What problem does this PR solve?

Fix code for:#164

Code changes

  • Has Rust code change
  • Has CI related scripts change

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Note for reviewer

@crwen
Copy link
Member

crwen commented Mar 16, 2024

Have you tried this before?

CREATE TABLE t1(pk int primary key, a int, b int);
CREATE TABLE t2(pk int primary key, a int, c int);
explain SELECT b FROM t1 WHERE a in (SELECT a FROM t2 where a > 1);
                               PLAN
-------------------------------------------------------------------
 Projection [t1.b] [Project]                                      +
   LeftSemi Join On t1.a = (t1.a) as (_temp_table_1_.a) [HashJoin]+
     Scan t1 -> [a, b] [SeqScan]                                  +
     Projection [t1.a] [Project]                                  +
       Filter (t1.a > 1), Is Having: false [Filter]               +
         Scan t2 -> [] [SeqScan]

The problem is here: First bind t1 and t1.b, t1.a, then the subquery. When binding the subquery, it hit these code, and get t1.a but not t2.a.
https://github.com/KipData/FnckSQL/blob/f8488f4c56ce1bef38a437476f78cfb9add77eb1/src/binder/expr.rs#L291-L299

@KKould
Copy link
Member Author

KKould commented Mar 16, 2024

You are a genius! Found so many issues and I fixed them, please check them out sir!

let _ = fnck_sql
    .run("CREATE TABLE t1(pk int primary key, a int, b int);")
    .await?;
let _ = fnck_sql
    .run("CREATE TABLE t2(pk int primary key, a int, c int);")
    .await?;
let (schema, tuples) = fnck_sql
    .run("explain SELECT b FROM t1 WHERE a in (SELECT a FROM t2 where a > 1);")
    .await?;
+-------------------------------------------------------------------+
| PLAN                                                              |
+===================================================================+
| Projection [t1.b] [Project]                                       |
|   LeftSemi Join On t1.a = (t2.a) as (_temp_table_1_.a) [HashJoin] |
|     Scan t1 -> [a, b] [SeqScan]                                   |
|     Projection [t2.a] [Project]                                   |
|       Filter (t2.a > 1), Is Having: false [Filter]                |
|         Scan t2 -> [a] [SeqScan]                                  |
+-------------------------------------------------------------------+

@KKould
Copy link
Member Author

KKould commented Mar 16, 2024

Have you tried this before?

CREATE TABLE t1(pk int primary key, a int, b int);
CREATE TABLE t2(pk int primary key, a int, c int);
explain SELECT b FROM t1 WHERE a in (SELECT a FROM t2 where a > 1);
                               PLAN
-------------------------------------------------------------------
 Projection [t1.b] [Project]                                      +
   LeftSemi Join On t1.a = (t1.a) as (_temp_table_1_.a) [HashJoin]+
     Scan t1 -> [a, b] [SeqScan]                                  +
     Projection [t1.a] [Project]                                  +
       Filter (t1.a > 1), Is Having: false [Filter]               +
         Scan t2 -> [] [SeqScan]

The problem is here: First bind t1 and t1.b, t1.a, then the subquery. When binding the subquery, it hit these code, and get t1.a but not t2.a.

@crwen
Please put this case in tests/slt/subquery.slt to avoid similar problems from recurring.

@crwen
Copy link
Member

crwen commented Mar 16, 2024

consider this also:

explain select a from t1 where a in (select t2.a from t1 as t2  where t2.pk > 10);
                               PLAN
-------------------------------------------------------------------
 Projection [t1.a] [Project]                                      +
   LeftSemi Join On t1.a = (t1.a) as (_temp_table_1_.a) [HashJoin]+
     Scan t1 -> [a] [SeqScan]                                     +
     Projection [t1.a] [Project]                                  +
       Filter (t1.pk > 10), Is Having: false [Filter]             +
         Scan t1 -> [pk, a] [SeqScan]

When calling try_reference, a alias will be eliminated and lead to a wrong pos.

https://github.com/KipData/FnckSQL/blob/f8488f4c56ce1bef38a437476f78cfb9add77eb1/src/expression/mod.rs#L131-L149

@KKould
Copy link
Member Author

KKould commented Mar 16, 2024

consider this also:

explain select a from t1 where a in (select t2.a from t1 as t2  where t2.pk > 10);
                               PLAN
-------------------------------------------------------------------
 Projection [t1.a] [Project]                                      +
   LeftSemi Join On t1.a = (t1.a) as (_temp_table_1_.a) [HashJoin]+
     Scan t1 -> [a] [SeqScan]                                     +
     Projection [t1.a] [Project]                                  +
       Filter (t1.pk > 10), Is Having: false [Filter]             +
         Scan t1 -> [pk, a] [SeqScan]

When calling try_reference, a alias will be eliminated and lead to a wrong pos.

Great, your observation skills are amazing.

I fixed the problem. Now the pos in the case you gave seems to be the same as before, but it is correct. I checked it in the new commit. The corresponding relationship of pos.

By the way, their pos may be the same, but they actually mean the positions in the respective left tuple and right tuple. This is because expr needs to be used in HashJoin to calculate the hash of the respective tuple. value, so if you need to use on_keys, please ensure that on_keys is calculated based on tuples that have not yet been merged.

left_build: https://github.com/KipData/FnckSQL/blob/main/src/execution/volcano/dql/join/hash_join.rs#L119
right_probe: https://github.com/KipData/FnckSQL/blob/main/src/execution/volcano/dql/join/hash_join.rs#L144

best_plan plan: LogicalPlan {
    operator: Project(
        ProjectOperator {
            exprs: [
                Reference {
                    expr: ColumnRef(
                        ColumnCatalog {
                            summary: ColumnSummary {
                                id: Some(
                                    1,
                                ),
                                name: "a",
                                table_name: Some(
                                    "t1",
                                ),
                            },
                            nullable: true,
                            desc: ColumnDesc {
                                column_datatype: Integer,
                                is_primary: false,
                                is_unique: false,
                                default: None,
                            },
                        },
                    ),
                    pos: 0,
                },
            ],
        },
    ),
    childrens: [
        LogicalPlan {
            operator: Join(
                JoinOperator {
                    on: On {
                        on: [
                            (
                                Reference {
                                    expr: ColumnRef(
                                        ColumnCatalog {
                                            summary: ColumnSummary {
                                                id: Some(
                                                    1,
                                                ),
                                                name: "a",
                                                table_name: Some(
                                                    "t1",
                                                ),
                                            },
                                            nullable: true,
                                            desc: ColumnDesc {
                                                column_datatype: Integer,
                                                is_primary: false,
                                                is_unique: false,
                                                default: None,
                                            },
                                        },
                                    ),
                                    pos: 0,
                                },
                                Reference {
                                    expr: Alias {
                                        expr: ColumnRef(
                                            ColumnCatalog {
                                                summary: ColumnSummary {
                                                    id: Some(
                                                        1,
                                                    ),
                                                    name: "a",
                                                    table_name: Some(
                                                        "t1",
                                                    ),
                                                },
                                                nullable: true,
                                                desc: ColumnDesc {
                                                    column_datatype: Integer,
                                                    is_primary: false,
                                                    is_unique: false,
                                                    default: None,
                                                },
                                            },
                                        ),
                                        alias: Expr(
                                            ColumnRef(
                                                ColumnCatalog {
                                                    summary: ColumnSummary {
                                                        id: Some(
                                                            1,
                                                        ),
                                                        name: "a",
                                                        table_name: Some(
                                                            "_temp_table_1_",
                                                        ),
                                                    },
                                                    nullable: true,
                                                    desc: ColumnDesc {
                                                        column_datatype: Integer,
                                                        is_primary: false,
                                                        is_unique: false,
                                                        default: None,
                                                    },
                                                },
                                            ),
                                        ),
                                    },
                                    pos: 0,
                                },
                            ),
                        ],
                        filter: None,
                    },
                    join_type: LeftSemi,
                },
            ),
            childrens: [
                LogicalPlan {
                    operator: Scan(
                        ScanOperator {
                            table_name: "t1",
                            primary_key: 0,
                            columns: [
                                (
                                    1,
                                    ColumnCatalog {
                                        summary: ColumnSummary {
                                            id: Some(
                                                1,
                                            ),
                                            name: "a",
                                            table_name: Some(
                                                "t1",
                                            ),
                                        },
                                        nullable: true,
                                        desc: ColumnDesc {
                                            column_datatype: Integer,
                                            is_primary: false,
                                            is_unique: false,
                                            default: None,
                                        },
                                    },
                                ),
                            ],
                            limit: (
                                None,
                                None,
                            ),
                            index_infos: [
                                IndexInfo {
                                    meta: IndexMeta {
                                        id: 0,
                                        column_ids: [
                                            0,
                                        ],
                                        table_name: "t1",
                                        pk_ty: Integer,
                                        name: "pk_pk",
                                        ty: PrimaryKey,
                                    },
                                    range: None,
                                },
                            ],
                        },
                    ),
                    childrens: [],
                    physical_option: Some(
                        SeqScan,
                    ),
                    _output_schema_ref: None,
                },
                LogicalPlan {
                    operator: Project(
                        ProjectOperator {
                            exprs: [
                                Alias {
                                    expr: Reference {
                                        expr: ColumnRef(
                                            ColumnCatalog {
                                                summary: ColumnSummary {
                                                    id: Some(
                                                        1,
                                                    ),
                                                    name: "a",
                                                    table_name: Some(
                                                        "t1",
                                                    ),
                                                },
                                                nullable: true,
                                                desc: ColumnDesc {
                                                    column_datatype: Integer,
                                                    is_primary: false,
                                                    is_unique: false,
                                                    default: None,
                                                },
                                            },
                                        ),
                                        pos: 0,
                                    },
                                    alias: Expr(
                                        ColumnRef(
                                            ColumnCatalog {
                                                summary: ColumnSummary {
                                                    id: Some(
                                                        1,
                                                    ),
                                                    name: "a",
                                                    table_name: Some(
                                                        "_temp_table_1_",
                                                    ),
                                                },
                                                nullable: true,
                                                desc: ColumnDesc {
                                                    column_datatype: Integer,
                                                    is_primary: false,
                                                    is_unique: false,
                                                    default: None,
                                                },
                                            },
                                        ),
                                    ),
                                },
                            ],
                        },
                    ),
                    childrens: [
                        LogicalPlan {
                            operator: Project(
                                ProjectOperator {
                                    exprs: [
                                        Reference {
                                            expr: ColumnRef(
                                                ColumnCatalog {
                                                    summary: ColumnSummary {
                                                        id: Some(
                                                            1,
                                                        ),
                                                        name: "a",
                                                        table_name: Some(
                                                            "t1",
                                                        ),
                                                    },
                                                    nullable: true,
                                                    desc: ColumnDesc {
                                                        column_datatype: Integer,
                                                        is_primary: false,
                                                        is_unique: false,
                                                        default: None,
                                                    },
                                                },
                                            ),
                                            pos: 1,
                                        },
                                    ],
                                },
                            ),
                            childrens: [
                                LogicalPlan {
                                    operator: Filter(
                                        FilterOperator {
                                            predicate: Binary {
                                                op: Gt,
                                                left_expr: Reference {
                                                    expr: ColumnRef(
                                                        ColumnCatalog {
                                                            summary: ColumnSummary {
                                                                id: Some(
                                                                    0,
                                                                ),
                                                                name: "pk",
                                                                table_name: Some(
                                                                    "t1",
                                                                ),
                                                            },
                                                            nullable: false,
                                                            desc: ColumnDesc {
                                                                column_datatype: Integer,
                                                                is_primary: true,
                                                                is_unique: false,
                                                                default: None,
                                                            },
                                                        },
                                                    ),
                                                    pos: 0,
                                                },
                                                right_expr: Constant(
                                                    Int32(0),
                                                ),
                                                ty: Boolean,
                                            },
                                            having: false,
                                        },
                                    ),
                                    childrens: [
                                        LogicalPlan {
                                            operator: Scan(
                                                ScanOperator {
                                                    table_name: "t1",
                                                    primary_key: 0,
                                                    columns: [
                                                        (
                                                            0,
                                                            ColumnCatalog {
                                                                summary: ColumnSummary {
                                                                    id: Some(
                                                                        0,
                                                                    ),
                                                                    name: "pk",
                                                                    table_name: Some(
                                                                        "t1",
                                                                    ),
                                                                },
                                                                nullable: false,
                                                                desc: ColumnDesc {
                                                                    column_datatype: Integer,
                                                                    is_primary: true,
                                                                    is_unique: false,
                                                                    default: None,
                                                                },
                                                            },
                                                        ),
                                                        (
                                                            1,
                                                            ColumnCatalog {
                                                                summary: ColumnSummary {
                                                                    id: Some(
                                                                        1,
                                                                    ),
                                                                    name: "a",
                                                                    table_name: Some(
                                                                        "t1",
                                                                    ),
                                                                },
                                                                nullable: true,
                                                                desc: ColumnDesc {
                                                                    column_datatype: Integer,
                                                                    is_primary: false,
                                                                    is_unique: false,
                                                                    default: None,
                                                                },
                                                            },
                                                        ),
                                                    ],
                                                    limit: (
                                                        None,
                                                        None,
                                                    ),
                                                    index_infos: [
                                                        IndexInfo {
                                                            meta: IndexMeta {
                                                                id: 0,
                                                                column_ids: [
                                                                    0,
                                                                ],
                                                                table_name: "t1",
                                                                pk_ty: Integer,
                                                                name: "pk_pk",
                                                                ty: PrimaryKey,
                                                            },
                                                            range: Some(
                                                                Scope {
                                                                    min: Excluded(
                                                                        Int32(0),
                                                                    ),
                                                                    max: Unbounded,
                                                                },
                                                            ),
                                                        },
                                                    ],
                                                },
                                            ),
                                            childrens: [],
                                            physical_option: Some(
                                                SeqScan,
                                            ),
                                            _output_schema_ref: None,
                                        },
                                    ],
                                    physical_option: Some(
                                        Filter,
                                    ),
                                    _output_schema_ref: None,
                                },
                            ],
                            physical_option: Some(
                                Project,
                            ),
                            _output_schema_ref: None,
                        },
                    ],
                    physical_option: Some(
                        Project,
                    ),
                    _output_schema_ref: None,
                },
            ],
            physical_option: Some(
                HashJoin,
            ),
            _output_schema_ref: None,
        },
    ],
    physical_option: Some(
        Project,
    ),
    _output_schema_ref: None,
}

@crwen
Copy link
Member

crwen commented Mar 17, 2024

It seems to project twice on temp table.

And I don't think it can handle more complex situations.

explain SELECT b FROM t1 WHERE a in (SELECT a FROM t2 where (select a from t1));
                               PLAN
-------------------------------------------------------------------
 Projection [t1.b] [Project]                                      +
   LeftSemi Join On t1.a = (t2.a) as (_temp_table_2_.a) [HashJoin]+
     Scan t1 -> [a, b] [SeqScan]                                  +
     Projection [(t2.a) as (_temp_table_2_.a)] [Project]          +
       Projection [t2.a] [Project]                                +
         Inner Join Where _temp_table_1_.a [HashJoin]             +
           Scan t2 -> [a] [SeqScan]                               +
           Projection [(t2.a) as (_temp_table_1_.a)] [Project]    +
             Projection [t2.a] [Project]                          +
               Scan t1 -> [] [SeqScan]

I think every query using their own binder context may be a solution, but it may be a big work.

@KKould
Copy link
Member Author

KKould commented Mar 17, 2024

It seems to project twice on temp table.

And I don't think it can handle more complex situations.

explain SELECT b FROM t1 WHERE a in (SELECT a FROM t2 where (select a from t1));
                               PLAN
-------------------------------------------------------------------
 Projection [t1.b] [Project]                                      +
   LeftSemi Join On t1.a = (t2.a) as (_temp_table_2_.a) [HashJoin]+
     Scan t1 -> [a, b] [SeqScan]                                  +
     Projection [(t2.a) as (_temp_table_2_.a)] [Project]          +
       Projection [t2.a] [Project]                                +
         Inner Join Where _temp_table_1_.a [HashJoin]             +
           Scan t2 -> [a] [SeqScan]                               +
           Projection [(t2.a) as (_temp_table_1_.a)] [Project]    +
             Projection [t2.a] [Project]                          +
               Scan t1 -> [] [SeqScan]

I think every query using their own binder context may be a solution, but it may be a big work.

Yes, let me create an issue for this problem to avoid affecting your PR.

@KKould
Copy link
Member Author

KKould commented Mar 17, 2024

For the two Projects, I referred to Datafusion’s TableAlias ​​and Project.

@KKould
Copy link
Member Author

KKould commented Mar 19, 2024

close by #164

@KKould KKould closed this Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants