Implementation of the CoopVec Inference and Training builtin intrinisics #7290

anupamachandra · 2025-04-01T19:10:10Z

Implements
HLSL:
__builtin_MatVecMul
__builtin_MatVecMulAdd
__builtin_OuterProductAccumulate
__builtin_VectorAccumulate

Lowered to
DXIL:
@dx.op.matVecMul
@dx.op.matVecMulAdd
@dx.op.outerProductAccumulate
@dx.op.vectorAccumulate

github-actions · 2025-04-01T19:11:09Z

✅ With the latest revision this PR passed the Python code formatter.

utils/hct/hctdb.py

damyanp · 2025-04-02T00:06:12Z

NOTE: this is a general issue with long vectors tracked by #7297. I'll keep this comment here since it has an interesting case we might want to test in it.

This applies to all the builtins I've tried so far, but the VectorAccumulate example is quite minimal. Given this code:

export void TruncatedVector(vector<half, 254> Input254, vector<half, 255> Input255) {
  __builtin_VectorAccumulate(Input254, RWBuf, 0);
  __builtin_VectorAccumulate(Input255, RWBuf, 0);
}```

This generates:

```llvm
; Function Attrs: nounwind
define void @"\01?TruncatedVector@@YAXV?$vector@$halff@$0PO@@@V?$vector@$halff@$0PP@@@@Z"(<254 x float> %Input254, <255 x float> %Input255) #0 {
  %1 = load %dx.types.Handle, %dx.types.Handle* @"\01?RWBuf@@3URWByteAddressBuffer@@A", align 4
  %2 = call %dx.types.Handle @dx.op.createHandleForLib.dx.types.Handle(i32 160, %dx.types.Handle %1)  ; CreateHandleForLib(Resource)
  %3 = call %dx.types.Handle @dx.op.annotateHandle(i32 216, %dx.types.Handle %2, %dx.types.ResourceProperties { i32 4107, i32 0 })  ; AnnotateHandle(res,props)  resource: RWByteAddressBuffer
  call void @dx.op.vectorAccumulate.v254f32(i32 308, <254 x float> %Input254, %dx.types.Handle %3, i32 0)  ; VectorAccumulate(inputVector,arrayBuffer,arrayOffset)
  %4 = shufflevector <255 x float> %Input255, <255 x float> undef, <1 x i32> zeroinitializer
  %5 = call %dx.types.Handle @dx.op.createHandleForLib.dx.types.Handle(i32 160, %dx.types.Handle %1)  ; CreateHandleForLib(Resource)
  %6 = call %dx.types.Handle @dx.op.annotateHandle(i32 216, %dx.types.Handle %5, %dx.types.ResourceProperties { i32 4107, i32 0 })  ; AnnotateHandle(res,props)  resource: RWByteAddressBuffer
  call void @dx.op.vectorAccumulate.v1f32(i32 308, <1 x float> %4, %dx.types.Handle %6, i32 0)  ; VectorAccumulate(inputVector,arrayBuffer,arrayOffset)
  ret void
}

Note how Input255 is explicitly truncated to 1xfloat before the vectorAccumulate is called. Input254 is not truncated.

utils/hct/gen_intrin_main.txt

lib/HLSL/HLOperationLower.cpp

lib/DXIL/DxilShaderModel.cpp

github-actions · 2025-04-02T02:10:32Z

✅ With the latest revision this PR passed the C/C++ code formatter.

utils/hct/hctdb.py

Co-authored-by: Damyan Pepper <[email protected]>

tex3d

Besides the generated content, I think this looks good. Just one small nit regarding DXIL Op descriptions.

I believe the generated content is out of date/incorrect, since I noticed some deleted operations and a missing .json file update. In any case, generated files will need to be updated before the final PR is ready for merging.

utils/hct/hctdb.py

…r Float8(E4M3 and E5M2) MatrixInterpretation."

…ller type. The declared input type must be 32-bit unsigned integer.

…ide)

non-overload test)

…validation errors per review feedback, some cleanup

…taccumulate and vector accumulate functions

…alg ops

utils/hct/hctdb.py

tex3d · 2025-04-18T19:37:16Z

Just an FYI:
Our gcc pipelines started failing today because of a docker image update that updates Ubuntu (will be ultimately to Ubunutu v24.4 by 5/9), which will require us to bump our gcc version a few. A fix is in the works, but in the meantime, I think we should override this failure and merge if there are no other failures, once the other pipelines are complete.

…ics (microsoft#7290) Implements HLSL: __builtin_MatVecMul __builtin_MatVecMulAdd __builtin_OuterProductAccumulate __builtin_VectorAccumulate Lowered to DXIL: @dx.op.matVecMul @dx.op.matVecMulAdd @dx.op.outerProductAccumulate @dx.op.vectorAccumulate --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Damyan Pepper <[email protected]> Co-authored-by: Simon Moll <[email protected]> Co-authored-by: Tex Riddell <[email protected]> Co-authored-by: Chris B <[email protected]> (cherry picked from commit 1db8c5b)

anupamachandra added 3 commits April 1, 2025 11:47

Implement MulAdd, OutProdAcc, VecAcc lowering

75bb05e

Add is signed parameters to the builtins

913c352

Change parameter names for better readability

a6863d4

github-project-automation bot added this to HLSL Roadmap Apr 1, 2025

github-project-automation bot moved this to New in HLSL Roadmap Apr 1, 2025

Keeping parameter names accurate

49bc4f0

anupamachandra marked this pull request as ready for review April 1, 2025 19:26

anupamachandra requested a review from a team as a code owner April 1, 2025 19:26

anupamachandra requested review from tex3d, damyanp, llvm-beanz and pow2clk April 1, 2025 19:27

damyanp added this to HLSL Support Apr 1, 2025

damyanp assigned tex3d, llvm-beanz and pow2clk Apr 1, 2025

damyanp reviewed Apr 1, 2025

View reviewed changes

utils/hct/hctdb.py Outdated Show resolved Hide resolved

damyanp reviewed Apr 2, 2025

View reviewed changes

utils/hct/gen_intrin_main.txt Outdated Show resolved Hide resolved

llvm-beanz reviewed Apr 2, 2025

View reviewed changes

lib/HLSL/HLOperationLower.cpp Outdated Show resolved Hide resolved

lib/DXIL/DxilShaderModel.cpp Outdated Show resolved Hide resolved

anupamachandra added 2 commits April 1, 2025 18:28

After clang-format

564689a

Fix variable names per LLVM coding standards

6ec25f8

chore: autopublish 2025-04-02T15:32:45Z

3081f9e

damyanp reviewed Apr 2, 2025

View reviewed changes

utils/hct/hctdb.py Outdated Show resolved Hide resolved

anupamachandra and others added 2 commits April 2, 2025 09:50

Update utils/hct/hctdb.py

ae115e1

Co-authored-by: Damyan Pepper <[email protected]>

Change isSigned to isUnsigned

2b2656d

damyanp mentioned this pull request Apr 2, 2025

SM 6.9 cooperative vectors builtins need to be in 'dx' namespace #7301

Open

tex3d reviewed Apr 3, 2025

View reviewed changes

utils/hct/hctdb.py Outdated Show resolved Hide resolved

simoll added 10 commits April 15, 2025 09:25

Repair after DXILMatrixLayout -> LinalgMatrixLayout name change

503d4b8

Regen after attr change from 'None' -> 'ReadOnly'

797d89c

nfc: store output vector in tests to keep coopvec dxil ops alive

b8e4985

Remove redundant tests (subsumed by multioverload versions)

47c4f3f

CheckLinalgInterpretation for InMemory and InRegister type validation

cc6cc27

Align test with spec: "Note: Only Optimal layouts can be used with fo…

d344e73

…r Float8(E4M3 and E5M2) MatrixInterpretation."

Align test with spec: "Packed" type conversions are bitcasts to a sma…

faa8f8f

…ller type. The declared input type must be 32-bit unsigned integer.

Fix DXIL OuterProductAccmulate param ordering (minterp, mlayout, mstr…

41217f3

…ide)

Multioverload OuterProductAccmuluate test (and remove redundant

1b23b26

non-overload test)

nfc: autoformat

a3f76ef

bob80905 mentioned this pull request Apr 15, 2025

[CoopVec] Add Linear Algebra common header with tests #7350

Open

tex3d added 2 commits April 15, 2025 19:46

Merge remote-tracking branch 'ms/staging-sm6.9' into coop-vec-5

871694c

Update tests with new hl opcodes

791dff3

anupamachandra mentioned this pull request Apr 16, 2025

[0029] Update Spec with DXIL Validation changes microsoft/hlsl-specs#491

Open

anupamachandra added 3 commits April 17, 2025 23:09

Remvoved MatrixLayout check for OuterProductAccumulate, updated DXIL …

d5f03f2

…validation errors per review feedback, some cleanup

Update linalg_builtin.hlsl to intialize input vectors for outerproduc…

f5a1a51

…taccumulate and vector accumulate functions

Removed undefs and added named checks for resource handles in the lin…

1538453

…alg ops

tex3d reviewed Apr 18, 2025

View reviewed changes

utils/hct/hctdb.py Outdated Show resolved Hide resolved

tex3d reviewed Apr 18, 2025

View reviewed changes

utils/hct/hctdb.py Outdated Show resolved Hide resolved

anupamachandra added 2 commits April 18, 2025 11:29

Update diagnostics per llvm coding guidelines

7cbe979

Missing trailing comma was bugging darker

faf6df0

tex3d approved these changes Apr 18, 2025

View reviewed changes

llvm-beanz approved these changes Apr 18, 2025

View reviewed changes

Fix linalg-builtins.hlsl file check: missing comma

8782246

tex3d approved these changes Apr 18, 2025

View reviewed changes

damyanp merged commit 1db8c5b into microsoft:staging-sm6.9 Apr 18, 2025
9 of 12 checks passed

github-project-automation bot moved this from New to Done in HLSL Roadmap Apr 18, 2025

damyanp moved this from Needs Review to Closed in HLSL Support Apr 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of the CoopVec Inference and Training builtin intrinisics #7290

Implementation of the CoopVec Inference and Training builtin intrinisics #7290

anupamachandra commented Apr 1, 2025

github-actions bot commented Apr 1, 2025 •

edited

Loading

damyanp commented Apr 2, 2025 •

edited

Loading

github-actions bot commented Apr 2, 2025 •

edited

Loading

tex3d left a comment

tex3d commented Apr 18, 2025

Implementation of the CoopVec Inference and Training builtin intrinisics #7290

Implementation of the CoopVec Inference and Training builtin intrinisics #7290

Conversation

anupamachandra commented Apr 1, 2025

github-actions bot commented Apr 1, 2025 • edited Loading

damyanp commented Apr 2, 2025 • edited Loading

github-actions bot commented Apr 2, 2025 • edited Loading

tex3d left a comment

Choose a reason for hiding this comment

tex3d commented Apr 18, 2025

github-actions bot commented Apr 1, 2025 •

edited

Loading

damyanp commented Apr 2, 2025 •

edited

Loading

github-actions bot commented Apr 2, 2025 •

edited

Loading