-
Notifications
You must be signed in to change notification settings - Fork 13.5k
Ineffectual bitwise or with constant emitted for mask operand of vperm(b|w|d|q|ps|pd) #106413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@llvm/issue-subscribers-backend-x86 Author: Cristian Vîjdea (cvijdea-bd)
Same thing as https://github.com//issues/106256, but also happens for the (avx2/avs512) permute[x]var intrinsics, while the PR https://github.com//pull/106377 seems to only fix it for (v)pshufb specifically.
Godbolt examples: https://godbolt.org/z/MsTcx7qYc The vector permute intrinsics ignore all bits except the ones that match the required index size, e.g.:
The OR operations with unrelated bits should be optimzied out. Probably applies to vpermt2 (e.g. _mm512_permutex2var_epi16) also, with 1 more bit used since they selected from two concatenated vectors. cc @RKSimon |
vpermilpd/vpermilps will need support as well - vpermilpd is annoying as it doesn't use the lsb for the index |
… values VPERMILPS lower bits0-3 (to index per-lane i32/f32 0-3) VPERMILPD uses bit1 (to index per-lane i64/f64 0-1) Use SimplifyDemandedBits to ignore anything touching the remaining bits. Part of #106413
…V3 mask values VPERMV/VPERMV3 only uses the lower bits of the vector element indices - so use SimplifyDemandedBits to ignore anything touching the remaining bits. Fixes llvm#106413
…V3 mask values VPERMV/VPERMV3 only uses the lower bits of the vector element indices - so use SimplifyDemandedBits to ignore anything touching the remaining bits. Fixes llvm#106413
@cvijdea-bd Just so you know, the |
Yeah I noticed that while looking over your fix, great stuff... |
Same thing as #106256, but also happens for the (avx2/avs512) permute[x]var intrinsics, while the PR #106377 seems to only fix it for (v)pshufb specifically.
Godbolt examples: https://godbolt.org/z/MsTcx7qYc
The vector permute intrinsics ignore all bits except the ones that match the required index size, e.g.:
The OR operations with unrelated bits should be optimzied out.
Probably applies to vpermt2 (e.g. _mm512_permutex2var_epi16) also, with 1 more bit used since they selected from two concatenated vectors.
cc @RKSimon
The text was updated successfully, but these errors were encountered: