Skip to content

Conversation

@mieshkiwrk
Copy link
Contributor

@mieshkiwrk mieshkiwrk commented Dec 1, 2025

  • Extracted calculateWarpsPerTile and calculateRepCluster from TritonIntelGPUAttrDefs with calculateDPASRepetitions and exposed it for python via calculate_warps_per_tile and calculate_rep_cluster methods to perform gemm benchmark for gluon on the same layouts like for triton for apple to apple comparison
  • Added gluon gemm/batched gemm kernels with same autotune parameters like for triton
  • Added tensor_descriptor interface for gluon with load/load_2d/store/store_2d/prefetch/prefetch_2d functionalities (converted into block pointers underneath for now)
  • Take layout into account for add_convert_tdesc_to_block_pointer pass (previously such information was lost) and also ttig.block_io attribute

Will include some performance results from PVC soon

Data for BMG - B580

LIBIGC1_VERSION=2.18.5-1188
LEVEL_ZERO_VERSION=1.24.1-1~24.04
AGAMA_VERSION=1188
GPU_DEVICE=Intel(R) Arc(TM) B580 Graphics
TORCH_VERSION=2.10.0a0+git01f94d4
COMPILER_VERSION=2025.3.1
image
Data for PVC - Max 1550

LIBIGC1_VERSION=2.20.5-1206
LEVEL_ZERO_VERSION=1.24.3-1~24.04
AGAMA_VERSION=1206
GPU_DEVICE=Intel(R) Data Center GPU Max 1550
TORCH_VERSION=2.10.0a0+git01f94d4
COMPILER_VERSION=2025.3.1
image

@mieshkiwrk mieshkiwrk marked this pull request as draft December 1, 2025 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants