Skip to content

opt: add an exploration rule to convert correlated self-join with LIMIT 1 into LATERAL #151503

@yuzefovich

Description

@yuzefovich

Inspired by this support ticket.

Consider the following setup:

CREATE TABLE activity (
    shard
        INT8 NOT NULL,
    id
        STRING NOT NULL,
    created_at
        TIMESTAMPTZ NOT NULL,
    PRIMARY KEY (shard ASC, id ASC)
);

CREATE TABLE log (
    shard
        INT8 NOT NULL,
    id
        STRING NOT NULL,
    created_at
        TIMESTAMPTZ NOT NULL,
    PRIMARY KEY (shard ASC, id ASC),
    UNIQUE (id, created_at)
);

EXPLAIN
    SELECT
        *
    FROM
        activity
        LEFT JOIN log ON
                log.id = activity.id
                AND log.created_at
                    = (
                            SELECT
                                log1.created_at
                            FROM
                                log AS log1
                            WHERE
                                log1.id = log.id
                                AND log1.created_at
                                    < activity.created_at
                            ORDER BY
                                log1.created_at
                                    DESC
                            LIMIT
                                1
                        );

Michael and I think that the query can be rewritten with an equivalent transformation like so

EXPLAIN
    SELECT
        *
    FROM
        activity
        LEFT JOIN LATERAL (
                SELECT
                    log.*
                FROM
                    log
                WHERE
                    log.created_at
                    < activity.created_at
                ORDER BY
                    log.created_at DESC
                LIMIT
                    1
            )
                AS log ON log.id = activity.id;

We should consider adding an exploration rule like this. Some necessary conditions are:

  • LATERAL join will always return one row, whereas LEFT JOIN might return multiple. So it must be the case, that for a given log.id value, all created_at values are unique.
  • all predicates on log table must be included within the correlated subquery (i.e. if we have a predicate on log outside, the transformation is not valid).
  • perhaps need to also consider nullability of the columns.

Jira issue: CRDB-53316

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-sql-optimizerSQL logical planning and optimizations.C-performancePerf of queries or internals. Solution not expected to change functional behavior.O-supportWould prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docsP-3Issues/test failures with no fix SLAT-sql-queriesSQL Queries Team

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions