-
Notifications
You must be signed in to change notification settings - Fork 13.5k
Ineffectual por with constant emitted for pshufb operand #106256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@llvm/issue-subscribers-backend-x86 Author: Cristian Vîjdea (cvijdea-bd)
Clang example: https://godbolt.org/z/jco9dn95W, flags: `-O3 -march=x86-64-v2`
#include <immintrin.h>
extern "C" __m128i shuffle_or(__m128i bytes, __m128i idxs) {
return _mm_shuffle_epi8(bytes, _mm_or_si128(idxs, _mm_set1_epi8(112)));
} The .LCPI0_0:
.zero 16,112
shuffle_or:
por xmm1, xmmword ptr [rip + .LCPI0_0]
pshufb xmm0, xmm1
ret <details> Clang example: https://godbolt.org/z/r67EqKqK8, flags: #include <immintrin.h>
extern "C" __m128i shuffle_or(__m128i bytes, __m128i idxs) {
return _mm_shuffle_epi8(bytes, _mm_or_si128(idxs, _mm_set1_epi8(112)));
} The .LCPI0_0:
.zero 16,112
shuffle_or:
por xmm1, xmmword ptr [rip + .LCPI0_0]
pshufb xmm0, xmm1
ret </details> |
(V)PSHUFB only uses the sign bit (for zeroing) and the lower 4 bits (to index per-lane byte 0-15) - so use SimplifyDemandedBits to ignore anything touching the remaining bits. Fixes llvm#106256
(V)PSHUFB only uses the sign bit (for zeroing) and the lower 4 bits (to index per-lane byte 0-15) - so use SimplifyDemandedBits to ignore anything touching the remaining bits. Fixes llvm#106256
Thanks for looking into this so quickly! |
Clang example: https://godbolt.org/z/ec4P4j78b, flags:
-O3 -march=x86-64-v2
. Not clang specific, same behaviour on rust nightly.The
por
of xmm1 with 112 (0b0111_0000
) is a no-op and should be optimized out, as pshufb ignores bits 5-7 of the mask argument:Writing
_mm_shuffle_epi8(bytes, _mm_set1_epi8(127))
in the source emits a pshufb with15
in the assembly, so it seems like LLVM is aware of this optimization on some level, but fails to apply it here.The text was updated successfully, but these errors were encountered: