-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Update examples in waveballot.md and waveactivecountbits.md #2077
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: docs
Are you sure you want to change the base?
Conversation
This example was incorrectly copied from the [spec](https://github.com/microsoft/DirectXShaderCompiler/wiki/Wave-Intrinsics#uint-waveactivecountbits-bool-bbit-), replacing `countbits` with `WaveActiveCountBits`.
@damyanp : Thanks for your contribution! The author(s) and reviewer(s) have been notified to review your proposed change. |
|
||
``` syntax | ||
result = WaveActiveCountBits( WaveActiveBallot( bBit ) ); | ||
result = countbits( WaveActiveBallot( bBit ) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an improvement, but still not quite correct HLSL. It needs to be more like this:
result = countbits( WaveActiveBallot( bBit ) ); | |
uint4 counts = countbits( WaveActiveBallot( bBit ) ); | |
result = counts.x + counts.y + counts.z + counts.w; |
This is because WaveActiveBallow
returns a uint4 in order to have one bit for every potential lane in the widest possible lane width of 128.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Updated.
|
||
## Examples | ||
|
||
This can be implemented more efficiently than a full WaveActiveSum, as described in the following example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this line should be clarified that this is meant to demonstrate what this operation does:
This can be implemented more efficiently than a full WaveActiveSum, as described in the following example: | |
This can be implemented more efficiently than a full WaveActiveSum, in a way similar to this equivalent code: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried another way to put this, hopefully this is clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better!
One example was incorrectly copied from the spec, replacing
countbits
withWaveActiveCountBits
. In both cases they were relying on an implicit vector truncation which would result in incorrect results on devices with wave size > 32.