-
Notifications
You must be signed in to change notification settings - Fork 751
Implementation of the CoopVec Inference and Training builtin intrinisics #7290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of the CoopVec Inference and Training builtin intrinisics #7290
Conversation
✅ With the latest revision this PR passed the Python code formatter. |
This applies to all the builtins I've tried so far, but the VectorAccumulate example is quite minimal. Given this code: export void TruncatedVector(vector<half, 254> Input254, vector<half, 255> Input255) {
__builtin_VectorAccumulate(Input254, RWBuf, 0);
__builtin_VectorAccumulate(Input255, RWBuf, 0);
}```
This generates:
```llvm
; Function Attrs: nounwind
define void @"\01?TruncatedVector@@YAXV?$vector@$halff@$0PO@@@V?$vector@$halff@$0PP@@@@Z"(<254 x float> %Input254, <255 x float> %Input255) #0 {
%1 = load %dx.types.Handle, %dx.types.Handle* @"\01?RWBuf@@3URWByteAddressBuffer@@A", align 4
%2 = call %dx.types.Handle @dx.op.createHandleForLib.dx.types.Handle(i32 160, %dx.types.Handle %1) ; CreateHandleForLib(Resource)
%3 = call %dx.types.Handle @dx.op.annotateHandle(i32 216, %dx.types.Handle %2, %dx.types.ResourceProperties { i32 4107, i32 0 }) ; AnnotateHandle(res,props) resource: RWByteAddressBuffer
call void @dx.op.vectorAccumulate.v254f32(i32 308, <254 x float> %Input254, %dx.types.Handle %3, i32 0) ; VectorAccumulate(inputVector,arrayBuffer,arrayOffset)
%4 = shufflevector <255 x float> %Input255, <255 x float> undef, <1 x i32> zeroinitializer
%5 = call %dx.types.Handle @dx.op.createHandleForLib.dx.types.Handle(i32 160, %dx.types.Handle %1) ; CreateHandleForLib(Resource)
%6 = call %dx.types.Handle @dx.op.annotateHandle(i32 216, %dx.types.Handle %5, %dx.types.ResourceProperties { i32 4107, i32 0 }) ; AnnotateHandle(res,props) resource: RWByteAddressBuffer
call void @dx.op.vectorAccumulate.v1f32(i32 308, <1 x float> %4, %dx.types.Handle %6, i32 0) ; VectorAccumulate(inputVector,arrayBuffer,arrayOffset)
ret void
} Note how Input255 is explicitly truncated to 1xfloat before the vectorAccumulate is called. Input254 is not truncated. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
Co-authored-by: Damyan Pepper <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the generated content, I think this looks good. Just one small nit regarding DXIL Op descriptions.
I believe the generated content is out of date/incorrect, since I noticed some deleted operations and a missing .json
file update. In any case, generated files will need to be updated before the final PR is ready for merging.
…r Float8(E4M3 and E5M2) MatrixInterpretation."
…ller type. The declared input type must be 32-bit unsigned integer.
non-overload test)
…validation errors per review feedback, some cleanup
…taccumulate and vector accumulate functions
Just an FYI: |
…ics (microsoft#7290) Implements HLSL: __builtin_MatVecMul __builtin_MatVecMulAdd __builtin_OuterProductAccumulate __builtin_VectorAccumulate Lowered to DXIL: @dx.op.matVecMul @dx.op.matVecMulAdd @dx.op.outerProductAccumulate @dx.op.vectorAccumulate --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Damyan Pepper <[email protected]> Co-authored-by: Simon Moll <[email protected]> Co-authored-by: Tex Riddell <[email protected]> Co-authored-by: Chris B <[email protected]> (cherry picked from commit 1db8c5b)
Implements
HLSL:
__builtin_MatVecMul
__builtin_MatVecMulAdd
__builtin_OuterProductAccumulate
__builtin_VectorAccumulate
Lowered to
DXIL:
@dx.op.matVecMul
@dx.op.matVecMulAdd
@dx.op.outerProductAccumulate
@dx.op.vectorAccumulate