Skip to content

Commit 1684c65

Browse files
committed
[X86] Fix logic for optimizing movmsk(bitcast(shuffle(x))); PR67287
Prior logic would remove the shuffle iff all of the elements in `x` where used. This is incorrect. The issue is `movmsk` only cares about the highbits, so if the width of the elements in `x` is smaller than the width of the elements for the `movmsk`, then the shuffle, even if it preserves all the elements, may change which ones are used by the highbits. For example: `movmsk64(bitcast(shuffle32(x, (1,0,3,2))))` Even though the shuffle mask `(1,0,3,2)` preserves all the elements, it flips which will be relevant to the `movmsk64` (x[1] and x[3] before and x[0] and x[2] after). The fix here, is to ensure that the shuffle mask can be scaled to the element width of the `movmsk` instruction. This ensure that the "high" elements stay "high". This is overly conservative as it misses cases like `(1,1,3,3)` where the "high" elements stay intact despite not be scalable, but for an relatively edge-case optimization that should generally be handled during simplifyDemandedBits, it seems okay.
1 parent 65a576e commit 1684c65

File tree

2 files changed

+22
-6
lines changed

2 files changed

+22
-6
lines changed

llvm/lib/Target/X86/X86ISelLowering.cpp

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -45834,13 +45834,28 @@ static SDValue combineSetCCMOVMSK(SDValue EFLAGS, X86::CondCode &CC,
4583445834
}
4583545835

4583645836
// MOVMSK(SHUFFLE(X,u)) -> MOVMSK(X) iff every element is referenced.
45837-
SmallVector<int, 32> ShuffleMask;
45837+
// Since we peek through a bitcast, we need to be careful if the base vector
45838+
// type has smaller elements than the MOVMSK type. In that case, even if
45839+
// all the elements are demanded by the shuffle mask, only the "high"
45840+
// elements which have highbits that align with highbits in the MOVMSK vec
45841+
// elements are actually demanded. A simplification of spurious operations
45842+
// on the "low" elements take place during other simplifications.
45843+
//
45844+
// For example:
45845+
// MOVMSK64(BITCAST(SHUF32 X, (1,0,3,2))) even though all the elements are
45846+
// demanded, because we are swapping around the result can change.
45847+
//
45848+
// To address this, we check that we can scale the shuffle mask to MOVMSK
45849+
// element width (this will ensure "high" elements match). Its slightly overly
45850+
// conservative, but fine for an edge case fold.
45851+
SmallVector<int, 32> ShuffleMask, ScaledMaskUnused;
4583845852
SmallVector<SDValue, 2> ShuffleInputs;
4583945853
if (NumElts <= CmpBits &&
4584045854
getTargetShuffleInputs(peekThroughBitcasts(Vec), ShuffleInputs,
4584145855
ShuffleMask, DAG) &&
4584245856
ShuffleInputs.size() == 1 && !isAnyZeroOrUndef(ShuffleMask) &&
45843-
ShuffleInputs[0].getValueSizeInBits() == VecVT.getSizeInBits()) {
45857+
ShuffleInputs[0].getValueSizeInBits() == VecVT.getSizeInBits() &&
45858+
scaleShuffleElements(ShuffleMask, NumElts, ScaledMaskUnused)) {
4584445859
unsigned NumShuffleElts = ShuffleMask.size();
4584545860
APInt DemandedElts = APInt::getZero(NumShuffleElts);
4584645861
for (int M : ShuffleMask) {

llvm/test/CodeGen/X86/movmsk-cmp.ll

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4434,13 +4434,14 @@ define i32 @PR39665_c_ray_opt(<2 x double> %x, <2 x double> %y) {
44344434
define i32 @pr67287(<2 x i64> %broadcast.splatinsert25) {
44354435
; SSE2-LABEL: pr67287:
44364436
; SSE2: # %bb.0: # %entry
4437-
; SSE2-NEXT: movl $3, %eax
4438-
; SSE2-NEXT: testl %eax, %eax
4439-
; SSE2-NEXT: jne .LBB97_2
4440-
; SSE2-NEXT: # %bb.1: # %entry
44414437
; SSE2-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
44424438
; SSE2-NEXT: pxor %xmm1, %xmm1
44434439
; SSE2-NEXT: pcmpeqd %xmm0, %xmm1
4440+
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,0,3,2]
4441+
; SSE2-NEXT: movmskpd %xmm0, %eax
4442+
; SSE2-NEXT: testl %eax, %eax
4443+
; SSE2-NEXT: jne .LBB97_2
4444+
; SSE2-NEXT: # %bb.1: # %entry
44444445
; SSE2-NEXT: movd %xmm1, %eax
44454446
; SSE2-NEXT: testb $1, %al
44464447
; SSE2-NEXT: jne .LBB97_2

0 commit comments

Comments
 (0)