[WIP][Gluon] Support tensor descriptors and explicit load/store/prefetch operations #5586

mieshkiwrk · 2025-12-01T09:03:16Z

Extracted calculateWarpsPerTile and calculateRepCluster from TritonIntelGPUAttrDefs with calculateDPASRepetitions and exposed it for python via calculate_warps_per_tile and calculate_rep_cluster methods to perform gemm benchmark for gluon on the same layouts like for triton for apple to apple comparison
Added gluon gemm/batched gemm kernels with same autotune parameters like for triton
Added tensor_descriptor interface for gluon with load/load_2d/store/store_2d/prefetch/prefetch_2d functionalities (converted into block pointers underneath for now)
Take layout into account for add_convert_tdesc_to_block_pointer pass (previously such information was lost) and also ttig.block_io attribute

Will include some performance results from PVC soon

Data for BMG - B580

LIBIGC1_VERSION=2.18.5-1188
LEVEL_ZERO_VERSION=1.24.1-1~24.04
AGAMA_VERSION=1188
GPU_DEVICE=Intel(R) Arc(TM) B580 Graphics
TORCH_VERSION=2.10.0a0+git01f94d4
COMPILER_VERSION=2025.3.1

Data for PVC - Max 1550

LIBIGC1_VERSION=2.20.5-1206
LEVEL_ZERO_VERSION=1.24.3-1~24.04
AGAMA_VERSION=1206
GPU_DEVICE=Intel(R) Data Center GPU Max 1550
TORCH_VERSION=2.10.0a0+git01f94d4
COMPILER_VERSION=2025.3.1

…etch operations

…for gemm benchmark

…hon, enabled all gemm benchmark cases for gluon

…ckend-for-triton into mdziado/gluon

mieshkiwrk added 2 commits November 28, 2025 21:06

Check layout within tensor_descriptor -> block_pointer conversion pass

93a910f

Test support for tensor_descriptors in gluon, initial load/store/pref…

acb526d

…etch operations

mieshkiwrk marked this pull request as draft December 1, 2025 09:03

Linter fix + add annotate_module for gluon

024f495

mieshkiwrk mentioned this pull request Dec 1, 2025

[Gluon] Enable GEMM benchmark, reach perf parity with Triton implementation #5263

Open

mieshkiwrk added 8 commits December 3, 2025 10:16

Removed is_2d_block flag, added separate op + initial gluon provider …

2aa5ff7

…for gemm benchmark

Separated optimal DPAS layout calculation logic, exposed that for pyt…

ba1a5b7

…hon, enabled all gemm benchmark cases for gluon

Merge branch 'main' into mdziado/gluon

86e1a9e

Minor changes

b8e202a

Merge branch 'mdziado/gluon' of https://github.com/intel/intel-xpu-ba…

82bb685

…ckend-for-triton into mdziado/gluon

Removed comments

a5e5495

Removed comment

b769738

Typo fix

d4494b6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP][Gluon] Support tensor descriptors and explicit load/store/prefetch operations #5586

[WIP][Gluon] Support tensor descriptors and explicit load/store/prefetch operations #5586

mieshkiwrk commented Dec 1, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[WIP][Gluon] Support tensor descriptors and explicit load/store/prefetch operations #5586

Are you sure you want to change the base?

[WIP][Gluon] Support tensor descriptors and explicit load/store/prefetch operations #5586

Conversation

mieshkiwrk commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mieshkiwrk commented Dec 1, 2025 •

edited

Loading