Skip to content

Implementation of the CoopVec Inference and Training builtin intrinisics #7290

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 97 commits into from
Apr 18, 2025

Conversation

anupamachandra
Copy link
Collaborator

Implements
HLSL:
__builtin_MatVecMul
__builtin_MatVecMulAdd
__builtin_OuterProductAccumulate
__builtin_VectorAccumulate

Lowered to
DXIL:
@dx.op.matVecMul
@dx.op.matVecMulAdd
@dx.op.outerProductAccumulate
@dx.op.vectorAccumulate

Copy link
Contributor

github-actions bot commented Apr 1, 2025

✅ With the latest revision this PR passed the Python code formatter.

@anupamachandra anupamachandra marked this pull request as ready for review April 1, 2025 19:26
@anupamachandra anupamachandra requested a review from a team as a code owner April 1, 2025 19:26
@damyanp
Copy link
Member

damyanp commented Apr 2, 2025

NOTE: this is a general issue with long vectors tracked by #7297. I'll keep this comment here since it has an interesting case we might want to test in it.

This applies to all the builtins I've tried so far, but the VectorAccumulate example is quite minimal. Given this code:

export void TruncatedVector(vector<half, 254> Input254, vector<half, 255> Input255) {
  __builtin_VectorAccumulate(Input254, RWBuf, 0);
  __builtin_VectorAccumulate(Input255, RWBuf, 0);
}```

This generates:

```llvm
; Function Attrs: nounwind
define void @"\01?TruncatedVector@@YAXV?$vector@$halff@$0PO@@@V?$vector@$halff@$0PP@@@@Z"(<254 x float> %Input254, <255 x float> %Input255) #0 {
  %1 = load %dx.types.Handle, %dx.types.Handle* @"\01?RWBuf@@3URWByteAddressBuffer@@A", align 4
  %2 = call %dx.types.Handle @dx.op.createHandleForLib.dx.types.Handle(i32 160, %dx.types.Handle %1)  ; CreateHandleForLib(Resource)
  %3 = call %dx.types.Handle @dx.op.annotateHandle(i32 216, %dx.types.Handle %2, %dx.types.ResourceProperties { i32 4107, i32 0 })  ; AnnotateHandle(res,props)  resource: RWByteAddressBuffer
  call void @dx.op.vectorAccumulate.v254f32(i32 308, <254 x float> %Input254, %dx.types.Handle %3, i32 0)  ; VectorAccumulate(inputVector,arrayBuffer,arrayOffset)
  %4 = shufflevector <255 x float> %Input255, <255 x float> undef, <1 x i32> zeroinitializer
  %5 = call %dx.types.Handle @dx.op.createHandleForLib.dx.types.Handle(i32 160, %dx.types.Handle %1)  ; CreateHandleForLib(Resource)
  %6 = call %dx.types.Handle @dx.op.annotateHandle(i32 216, %dx.types.Handle %5, %dx.types.ResourceProperties { i32 4107, i32 0 })  ; AnnotateHandle(res,props)  resource: RWByteAddressBuffer
  call void @dx.op.vectorAccumulate.v1f32(i32 308, <1 x float> %4, %dx.types.Handle %6, i32 0)  ; VectorAccumulate(inputVector,arrayBuffer,arrayOffset)
  ret void
}

Note how Input255 is explicitly truncated to 1xfloat before the vectorAccumulate is called. Input254 is not truncated.

Copy link
Contributor

github-actions bot commented Apr 2, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@tex3d tex3d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the generated content, I think this looks good. Just one small nit regarding DXIL Op descriptions.

I believe the generated content is out of date/incorrect, since I noticed some deleted operations and a missing .json file update. In any case, generated files will need to be updated before the final PR is ready for merging.

@tex3d
Copy link
Contributor

tex3d commented Apr 18, 2025

Just an FYI:
Our gcc pipelines started failing today because of a docker image update that updates Ubuntu (will be ultimately to Ubunutu v24.4 by 5/9), which will require us to bump our gcc version a few. A fix is in the works, but in the meantime, I think we should override this failure and merge if there are no other failures, once the other pipelines are complete.

@damyanp damyanp merged commit 1db8c5b into microsoft:staging-sm6.9 Apr 18, 2025
9 of 12 checks passed
@github-project-automation github-project-automation bot moved this from New to Done in HLSL Roadmap Apr 18, 2025
@damyanp damyanp moved this from Needs Review to Closed in HLSL Support Apr 22, 2025
tex3d added a commit to tex3d/DirectXShaderCompiler that referenced this pull request Apr 25, 2025
…ics (microsoft#7290)

Implements
HLSL:
__builtin_MatVecMul
__builtin_MatVecMulAdd
__builtin_OuterProductAccumulate
__builtin_VectorAccumulate

Lowered to
DXIL:
@dx.op.matVecMul
@dx.op.matVecMulAdd
 @dx.op.outerProductAccumulate
 @dx.op.vectorAccumulate

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Damyan Pepper <[email protected]>
Co-authored-by: Simon Moll <[email protected]>
Co-authored-by: Tex Riddell <[email protected]>
Co-authored-by: Chris B <[email protected]>
(cherry picked from commit 1db8c5b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants