Missing CPU features attributes on dispatch functions lead to UB / missed target instructions

Testcase: just a `i8 x i8 -> i32` matmul:

```mlir
func.func @matmul_dynamic(%lhs: tensor<?x?xi8>, %rhs: tensor<?x?xi8>, %acc: tensor<?x?xi32>) -> tensor<?x?xi32> {
  %result = linalg.matmul ins(%lhs, %rhs: tensor<?x?xi8>, tensor<?x?xi8>) outs(%acc: tensor<?x?xi32>) -> tensor<?x?xi32>
  return %result: tensor<?x?xi32>
}
```

Reproduce:

```
tools/iree-compile \
  ~/matmul_i8.mlir -o /tmp/a.vmfb \
  --iree-hal-target-backends=llvm-cpu \
  --iree-llvmcpu-target-cpu=znver4 \
  --iree-llvmcpu-enable-ukernels=all \
  --iree-hal-dump-executable-intermediates-to=/tmp \
  -mlir-disable-threading \
  -mlir-print-ir-after-all \
  2>/tmp/log
```

Inspection of the generated assembly `/tmp/module_matmul_i8_linked_llvm_cpu_embedded_elf_x86_64.s` shows that baseline AVX-512 code is generated (VPMADDWD) instead of the expected AVX-512-VNNI code (VPDPWSSD):

```asm
matmul_dynamic_dispatch_3_mmt4d_DxDxDx16x16x2_i8xi8xi32:
[...]
	vshufi64x2	$27, %zmm16, %zmm16, %zmm19
	vpmaddwd	%zmm16, %zmm21, %zmm24
	vpmaddwd	%zmm17, %zmm21, %zmm26
	vpmaddwd	%zmm18, %zmm21, %zmm25
	vpmaddwd	%zmm19, %zmm21, %zmm21
[...]
```

Why? The dumped intermediates show that all the way to the post-linking optimized IR (`/tmp/module_matmul_i8_linked_llvm_cpu_embedded_elf_x86_64.optimized.ll`), it was the expected AVX-512-VNNI intrinsic function:

```llvm
define internal noundef i32 @matmul_dynamic_dispatch_3_mmt4d_DxDxDx16x16x2_i8xi8xi32(ptr noalias nocapture nonnull readonly align 16 %0, ptr noalias nocapture nonnull readonly align 16 %1, ptr noalias nocapture nonnull readonly align 16 %2) #1 !dbg !90 {
[...]
  %358 = tail call <16 x i32> @llvm.x86.avx512.vpdpwssd.512(<16 x i32> %334, <16 x i32> %354, <16 x i32> %347), !dbg !91
  %359 = tail call <16 x i32> @llvm.x86.avx512.vpdpwssd.512(<16 x i32> %333, <16 x i32> %354, <16 x i32> %348), !dbg !91
  %360 = tail call <16 x i32> @llvm.x86.avx512.vpdpwssd.512(<16 x i32> %332, <16 x i32> %354, <16 x i32> %349), !dbg !91
  %361 = tail call <16 x i32> @llvm.x86.avx512.vpdpwssd.512(<16 x i32> %331, <16 x i32> %354, <16 x i32> %350), !dbg !91
  %362 = tail call <16 x i32> @llvm.x86.avx512.vpdpwssd.512(<16 x i32> %330, <16 x i32> %355, <16 x i32> %347), !dbg !91
  %363 = tail call <16 x i32> @llvm.x86.avx512.vpdpwssd.512(<16 x i32> %329, <16 x i32> %355, <16 x i32> %348), !dbg !91
[...]
```

But wait, what is that attribute `#1` on that function? Does it have the required CPU feature enabled? Nope:

```llvm
attributes #1 = { nofree norecurse nosync nounwind "frame-pointer"="all" "hot" "no-builtins" "nonlazybind" }
```

So our code here is Undefined Behavior, and indeed, while initially minimizing it with `llc`, I did run into should-not-get-here crashes in x86 instruction selection. And in our current e2e IREE use case, the Undefined Behavior, while not crashing or affecting correctness, is still causing us to miss the intended VNNI instruction.

"Of course" this dispatch function doesn't have the required `+avx512vnni` CPU feature attribute, since we never put it there. The only functions that have the `+avx512vnni` CPU feature attribute are the ukernel internal VNNI implementation functions, which are compiled with this CPU feature enabled in the first place.

I guess I was expecting the attribute to be propagated from callee to caller as the VNNI inner tile function gets inlined first into `iree_uk_mmt4d` and then into the dispatch function.  It's not.

How do we resolve that in a way that doesn't violate the design with target specialization in LLVMCPUTarget ?  @benvanik 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Missing CPU features attributes on dispatch functions lead to UB / missed target instructions #16670

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Missing CPU features attributes on dispatch functions lead to UB / missed target instructions #16670

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions