Skip to content

Conversation

damyanp
Copy link
Contributor

@damyanp damyanp commented Jul 11, 2025

One example was incorrectly copied from the spec, replacing countbits with WaveActiveCountBits. In both cases they were relying on an implicit vector truncation which would result in incorrect results on devices with wave size > 32.

This example was incorrectly copied from the [spec](https://github.com/microsoft/DirectXShaderCompiler/wiki/Wave-Intrinsics#uint-waveactivecountbits-bool-bbit-), replacing `countbits` with `WaveActiveCountBits`.
Copy link
Contributor

@damyanp : Thanks for your contribution! The author(s) and reviewer(s) have been notified to review your proposed change.


``` syntax
result = WaveActiveCountBits( WaveActiveBallot( bBit ) );
result = countbits( WaveActiveBallot( bBit ) );
Copy link
Contributor

@tex3d tex3d Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an improvement, but still not quite correct HLSL. It needs to be more like this:

Suggested change
result = countbits( WaveActiveBallot( bBit ) );
uint4 counts = countbits( WaveActiveBallot( bBit ) );
result = counts.x + counts.y + counts.z + counts.w;

This is because WaveActiveBallow returns a uint4 in order to have one bit for every potential lane in the widest possible lane width of 128.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Updated.


## Examples

This can be implemented more efficiently than a full WaveActiveSum, as described in the following example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this line should be clarified that this is meant to demonstrate what this operation does:

Suggested change
This can be implemented more efficiently than a full WaveActiveSum, as described in the following example:
This can be implemented more efficiently than a full WaveActiveSum, in a way similar to this equivalent code:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried another way to put this, hopefully this is clearer?

Copy link
Contributor

@tex3d tex3d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments.

@damyanp damyanp changed the title Update waveactivecountbits.md Update examples in waveballot.md and waveactivecountbits.md Jul 11, 2025
Copy link
Contributor

@tex3d tex3d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants