Skip to content

[DISCUSSION] Sorts being removed from subqueries #15886

@maxburke

Description

@maxburke

Describe the bug

Referenced discussion: https://the-asf.slack.com/archives/C01QUFS30TD/p1745875862723149

Given this table:

> create table d1 (ul_node_id string);

It appears that sorts are being removed from inner expressions. For example, here is a query with a sort expression:

> explain SELECT "ul_node_id", encode(ul_node_id, 'base64') FROM d1   ORDER BY encode(ul_node_id, 'base64') ASC NULLS LAST;
+---------------+-------------------------------+
| plan_type     | plan                          |
+---------------+-------------------------------+
| physical_plan | ┌───────────────────────────┐ |
|               | │          SortExec         │ |
|               | │    --------------------   │ |
|               | │ encode(d1.ul_node_id,Utf8(│ |
|               | │   "base64"))@1 ASC NULLS  │ |
|               | │            LAST           │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │       ProjectionExec      │ |
|               | │    --------------------   │ |
|               | │ encode(d1.ul_node_id,Utf8(│ |
|               | │        "base64")):        │ |
|               | │ encode(ul_node_id, base64)│ |
|               | │                           │ |
|               | │        ul_node_id:        │ |
|               | │         ul_node_id        │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │       DataSourceExec      │ |
|               | │    --------------------   │ |
|               | │          bytes: 0         │ |
|               | │       format: memory      │ |
|               | │          rows: 0          │ |
|               | └───────────────────────────┘ |
|               |                               |
+---------------+-------------------------------+
1 row(s) fetched.

If I wrap it in a no-op CTE the SortExec stage disappears:

> explain WITH t0 AS (SELECT "ul_node_id", encode(ul_node_id, 'base64') FROM d1   ORDER BY encode(ul_node_id, 'base64') ASC NULLS LAST) SELECT  * FROM t0;
+---------------+-------------------------------+
| plan_type     | plan                          |
+---------------+-------------------------------+
| physical_plan | ┌───────────────────────────┐ |
|               | │       ProjectionExec      │ |
|               | │    --------------------   │ |
|               | │ encode(d1.ul_node_id,Utf8(│ |
|               | │        "base64")):        │ |
|               | │ encode(ul_node_id, base64)│ |
|               | │                           │ |
|               | │        ul_node_id:        │ |
|               | │         ul_node_id        │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │       DataSourceExec      │ |
|               | │    --------------------   │ |
|               | │          bytes: 0         │ |
|               | │       format: memory      │ |
|               | │          rows: 0          │ |
|               | └───────────────────────────┘ |
|               |                               |
+---------------+-------------------------------+
1 row(s) fetched.

Manually hoisting out the ORDER BY from the CTE causes it to appear in the plan:

> explain WITH t0 AS (SELECT "ul_node_id", encode(ul_node_id, 'base64') FROM d1) SELECT * from t0   ORDER BY encode(ul_node_id, 'base64') ASC NULLS LAST;
+---------------+-------------------------------+
| plan_type     | plan                          |
+---------------+-------------------------------+
| physical_plan | ┌───────────────────────────┐ |
|               | │          SortExec         │ |
|               | │    --------------------   │ |
|               | │    encode(ul_node_id@0,   │ |
|               | │      base64) ASC NULLS    │ |
|               | │            LAST           │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │       ProjectionExec      │ |
|               | │    --------------------   │ |
|               | │ encode(d1.ul_node_id,Utf8(│ |
|               | │        "base64")):        │ |
|               | │ encode(ul_node_id, base64)│ |
|               | │                           │ |
|               | │        ul_node_id:        │ |
|               | │         ul_node_id        │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │       DataSourceExec      │ |
|               | │    --------------------   │ |
|               | │          bytes: 0         │ |
|               | │       format: memory      │ |
|               | │          rows: 0          │ |
|               | └───────────────────────────┘ |
|               |                               |
+---------------+-------------------------------+
1 row(s) fetched.

To Reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions