Skip to content

Tweak optimal default batch size #6916

@Dandandan

Description

@Dandandan

Is your feature request related to a problem or challenge?

A small test to double the batch size (from 8192 to 16384) shows some performance improvements (~10%) on some queries:

Comparing main and batch_size
--------------------
Benchmark tpch_mem.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     main ┃ batch_size ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 190.81ms │   186.92ms │     no change │
│ QQuery 2     │  59.14ms │    55.43ms │ +1.07x faster │
│ QQuery 3     │  47.03ms │    44.95ms │     no change │
│ QQuery 4     │  37.16ms │    33.60ms │ +1.11x faster │
│ QQuery 5     │  95.66ms │    86.47ms │ +1.11x faster │
│ QQuery 6     │  10.47ms │     8.93ms │ +1.17x faster │
│ QQuery 7     │ 196.26ms │   192.92ms │     no change │
│ QQuery 8     │  69.94ms │    63.64ms │ +1.10x faster │
│ QQuery 9     │ 136.17ms │   129.56ms │     no change │
│ QQuery 10    │  99.69ms │    97.21ms │     no change │
│ QQuery 11    │  45.22ms │    45.94ms │     no change │
│ QQuery 12    │  67.66ms │    65.41ms │     no change │
│ QQuery 13    │ 182.43ms │   156.67ms │ +1.16x faster │
│ QQuery 14    │  12.36ms │    11.16ms │ +1.11x faster │
│ QQuery 15    │  23.18ms │    20.79ms │ +1.11x faster │
│ QQuery 16    │  48.14ms │    47.22ms │     no change │
│ QQuery 17    │ 681.34ms │   682.63ms │     no change │
│ QQuery 18    │ 516.47ms │   490.86ms │     no change │
│ QQuery 19    │  58.40ms │    56.68ms │     no change │
│ QQuery 20    │ 193.13ms │   202.12ms │     no change │
│ QQuery 21    │ 261.12ms │   242.51ms │ +1.08x faster │
│ QQuery 22    │  27.81ms │    28.08ms │     no change │
└──────────────┴──────────┴────────────┴───────────────┘

This is nice for such a small change and aligns with #6287

Describe the solution you'd like

Configure a (more) optimal default

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestperformanceMake DataFusion faster

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions