[LV] Support generating masks for switch terminators. #99808

fhahn · 2024-07-21T11:17:13Z

Update createEdgeMask to created masks where the terminator in Src is a switch. We need to handle 2 separate cases:

Dst is not the default desintation. Dst is reached if any of the cases with destination == Dst are taken. Join the conditions for each case where destination == Dst using a logical OR.
Dst is the default destination. Dst is reached if none of the cases with destination != Dst are taken. Join the conditions for each case where the destination is != Dst using a logical OR and negate it.

Fixes #48188.

llvmbot · 2024-07-21T11:17:47Z

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Update createEdgeMask to created masks where the terminator in Src is a switch. We need to handle 2 separate cases:

Dst is not the default desintation. Dst is reached if any of the cases with destination == Dst are taken. Join the conditions for each case where destination == Dst using a logical OR.
Dst is the default destination. Dst is reached if none of the cases with destination != Dst are taken. Join the conditions for each case where the destination is != Dst using a logical OR and negate it.

Fixes #48188.

Patch is 84.17 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/99808.diff

7 Files Affected:

(modified) clang/test/Frontend/optimization-remark-analysis.c (+1-1)
(modified) llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp (+5-5)
(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+35)
(modified) llvm/test/Transforms/LoopVectorize/X86/predicate-switch.ll (+623-24)
(modified) llvm/test/Transforms/LoopVectorize/no_switch.ll (+5-7)
(modified) llvm/test/Transforms/LoopVectorize/predicate-switch.ll (+196-4)
(modified) llvm/test/Transforms/PhaseOrdering/X86/pr48844-br-to-switch-vectorization.ll (+55-5)

diff --git a/clang/test/Frontend/optimization-remark-analysis.c b/clang/test/Frontend/optimization-remark-analysis.c
index e43984942a6ef..9d8917265a320 100644
--- a/clang/test/Frontend/optimization-remark-analysis.c
+++ b/clang/test/Frontend/optimization-remark-analysis.c
@@ -1,7 +1,7 @@
 // RUN: %clang -O1 -fvectorize -target x86_64-unknown-unknown -emit-llvm -Rpass-analysis -S %s -o - 2>&1 | FileCheck %s --check-prefix=RPASS
 // RUN: %clang -O1 -fvectorize -target x86_64-unknown-unknown -emit-llvm -S %s -o - 2>&1 | FileCheck %s
 
-// RPASS: {{.*}}:12:5: remark: loop not vectorized: loop contains a switch statement
+// RPASS-NOT: {{.*}}:12:5: remark: loop not vectorized
 // CHECK-NOT: remark: loop not vectorized: loop contains a switch statement
 
 double foo(int N, int *Array) {
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
index f54eebb2874ab..7f84455150093 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
@@ -1348,11 +1348,11 @@ bool LoopVectorizationLegality::canVectorizeWithIfConvert() {
   // Collect the blocks that need predication.
   for (BasicBlock *BB : TheLoop->blocks()) {
     // We don't support switch statements inside loops.
-    if (!isa<BranchInst>(BB->getTerminator())) {
-      reportVectorizationFailure("Loop contains a switch statement",
-                                 "loop contains a switch statement",
-                                 "LoopContainsSwitch", ORE, TheLoop,
-                                 BB->getTerminator());
+    if (!isa<BranchInst, SwitchInst>(BB->getTerminator())) {
+      reportVectorizationFailure("Loop contains an unsupported termaintor",
+                                 "loop contains an unsupported terminator",
+                                 "LoopContainsUnsupportedTerminator", ORE,
+                                 TheLoop, BB->getTerminator());
       return false;
     }
 
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 6d28b8fabe42e..2530762e3e424 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7763,6 +7763,41 @@ VPValue *VPRecipeBuilder::createEdgeMask(BasicBlock *Src, BasicBlock *Dst) {
 
   VPValue *SrcMask = getBlockInMask(Src);
 
+  if (auto *SI = dyn_cast<SwitchInst>(Src->getTerminator())) {
+    // Create mask where the terminator in Src is a switch. We need to handle 2
+    // separate cases:
+    // 1. Dst is not the default desintation. Dst is reached if any of the cases
+    // with destination == Dst are taken. Join the conditions for each case
+    // where destination == Dst using a logical OR.
+    // 2. Dst is the default destination. Dst is reached if none of the cases
+    // with destination != Dst are taken. Join the conditions for each case
+    // where the destination is != Dst using a logical OR and negate it.
+    VPValue *Mask = nullptr;
+    VPValue *Cond = getVPValueOrAddLiveIn(SI->getCondition(), Plan);
+    bool IsDefault = SI->getDefaultDest() == Dst;
+    for (auto &C : SI->cases()) {
+      if (IsDefault) {
+        if (C.getCaseSuccessor() == Dst)
+          continue;
+      } else if (C.getCaseSuccessor() != Dst)
+        continue;
+
+      VPValue *Eq = EdgeMaskCache.lookup({Src, C.getCaseSuccessor()});
+      if (!Eq) {
+        VPValue *V = getVPValueOrAddLiveIn(C.getCaseValue(), Plan);
+        Eq = Builder.createICmp(CmpInst::ICMP_EQ, Cond, V);
+      }
+      if (Mask)
+        Mask = Builder.createOr(Mask, Eq);
+      else
+        Mask = Eq;
+    }
+    if (IsDefault)
+      Mask = Builder.createNot(Mask);
+    assert(Mask && "mask must be created");
+    return EdgeMaskCache[Edge] = Mask;
+  }
+
   // The terminator has to be a branch inst!
   BranchInst *BI = dyn_cast<BranchInst>(Src->getTerminator());
   assert(BI && "Unexpected terminator found");
diff --git a/llvm/test/Transforms/LoopVectorize/X86/predicate-switch.ll b/llvm/test/Transforms/LoopVectorize/X86/predicate-switch.ll
index b8ce3c40920a3..ff73a149c8e39 100644
--- a/llvm/test/Transforms/LoopVectorize/X86/predicate-switch.ll
+++ b/llvm/test/Transforms/LoopVectorize/X86/predicate-switch.ll
@@ -6,9 +6,43 @@ define void @switch_default_to_latch_common_dest(ptr %start, ptr %end) {
 ; IC1-LABEL: define void @switch_default_to_latch_common_dest(
 ; IC1-SAME: ptr [[START:%.*]], ptr [[END:%.*]]) #[[ATTR0:[0-9]+]] {
 ; IC1-NEXT:  [[ENTRY:.*]]:
+; IC1-NEXT:    [[START2:%.*]] = ptrtoint ptr [[START]] to i64
+; IC1-NEXT:    [[END1:%.*]] = ptrtoint ptr [[END]] to i64
+; IC1-NEXT:    [[TMP0:%.*]] = add i64 [[END1]], -8
+; IC1-NEXT:    [[TMP1:%.*]] = sub i64 [[TMP0]], [[START2]]
+; IC1-NEXT:    [[TMP2:%.*]] = lshr i64 [[TMP1]], 3
+; IC1-NEXT:    [[TMP3:%.*]] = add nuw nsw i64 [[TMP2]], 1
+; IC1-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP3]], 4
+; IC1-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; IC1:       [[VECTOR_PH]]:
+; IC1-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[TMP3]], 4
+; IC1-NEXT:    [[N_VEC:%.*]] = sub i64 [[TMP3]], [[N_MOD_VF]]
+; IC1-NEXT:    [[TMP4:%.*]] = mul i64 [[N_VEC]], 8
+; IC1-NEXT:    [[IND_END:%.*]] = getelementptr i8, ptr [[START]], i64 [[TMP4]]
+; IC1-NEXT:    br label %[[VECTOR_BODY:.*]]
+; IC1:       [[VECTOR_BODY]]:
+; IC1-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; IC1-NEXT:    [[OFFSET_IDX:%.*]] = mul i64 [[INDEX]], 8
+; IC1-NEXT:    [[TMP5:%.*]] = add i64 [[OFFSET_IDX]], 0
+; IC1-NEXT:    [[NEXT_GEP:%.*]] = getelementptr i8, ptr [[START]], i64 [[TMP5]]
+; IC1-NEXT:    [[TMP6:%.*]] = getelementptr i64, ptr [[NEXT_GEP]], i32 0
+; IC1-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP6]], align 1
+; IC1-NEXT:    [[TMP7:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD]], <i64 -12, i64 -12, i64 -12, i64 -12>
+; IC1-NEXT:    [[TMP8:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD]], <i64 13, i64 13, i64 13, i64 13>
+; IC1-NEXT:    [[TMP9:%.*]] = or <4 x i1> [[TMP7]], [[TMP8]]
+; IC1-NEXT:    [[TMP10:%.*]] = or <4 x i1> [[TMP9]], [[TMP9]]
+; IC1-NEXT:    call void @llvm.masked.store.v4i64.p0(<4 x i64> <i64 42, i64 42, i64 42, i64 42>, ptr [[TMP6]], i32 1, <4 x i1> [[TMP10]])
+; IC1-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
+; IC1-NEXT:    [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; IC1-NEXT:    br i1 [[TMP11]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; IC1:       [[MIDDLE_BLOCK]]:
+; IC1-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[TMP3]], [[N_VEC]]
+; IC1-NEXT:    br i1 [[CMP_N]], label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; IC1:       [[SCALAR_PH]]:
+; IC1-NEXT:    [[BC_RESUME_VAL:%.*]] = phi ptr [ [[IND_END]], %[[MIDDLE_BLOCK]] ], [ [[START]], %[[ENTRY]] ]
 ; IC1-NEXT:    br label %[[LOOP_HEADER:.*]]
 ; IC1:       [[LOOP_HEADER]]:
-; IC1-NEXT:    [[PTR_IV:%.*]] = phi ptr [ [[START]], %[[ENTRY]] ], [ [[PTR_IV_NEXT:%.*]], %[[LOOP_LATCH:.*]] ]
+; IC1-NEXT:    [[PTR_IV:%.*]] = phi ptr [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[PTR_IV_NEXT:%.*]], %[[LOOP_LATCH:.*]] ]
 ; IC1-NEXT:    [[L:%.*]] = load i64, ptr [[PTR_IV]], align 1
 ; IC1-NEXT:    switch i64 [[L]], label %[[LOOP_LATCH]] [
 ; IC1-NEXT:      i64 -12, label %[[IF_THEN:.*]]
@@ -20,16 +54,59 @@ define void @switch_default_to_latch_common_dest(ptr %start, ptr %end) {
 ; IC1:       [[LOOP_LATCH]]:
 ; IC1-NEXT:    [[PTR_IV_NEXT]] = getelementptr inbounds i64, ptr [[PTR_IV]], i64 1
 ; IC1-NEXT:    [[EC:%.*]] = icmp eq ptr [[PTR_IV_NEXT]], [[END]]
-; IC1-NEXT:    br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP_HEADER]]
+; IC1-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP3:![0-9]+]]
 ; IC1:       [[EXIT]]:
 ; IC1-NEXT:    ret void
 ;
 ; IC2-LABEL: define void @switch_default_to_latch_common_dest(
 ; IC2-SAME: ptr [[START:%.*]], ptr [[END:%.*]]) #[[ATTR0:[0-9]+]] {
 ; IC2-NEXT:  [[ENTRY:.*]]:
+; IC2-NEXT:    [[START2:%.*]] = ptrtoint ptr [[START]] to i64
+; IC2-NEXT:    [[END1:%.*]] = ptrtoint ptr [[END]] to i64
+; IC2-NEXT:    [[TMP0:%.*]] = add i64 [[END1]], -8
+; IC2-NEXT:    [[TMP1:%.*]] = sub i64 [[TMP0]], [[START2]]
+; IC2-NEXT:    [[TMP2:%.*]] = lshr i64 [[TMP1]], 3
+; IC2-NEXT:    [[TMP3:%.*]] = add nuw nsw i64 [[TMP2]], 1
+; IC2-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP3]], 8
+; IC2-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; IC2:       [[VECTOR_PH]]:
+; IC2-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[TMP3]], 8
+; IC2-NEXT:    [[N_VEC:%.*]] = sub i64 [[TMP3]], [[N_MOD_VF]]
+; IC2-NEXT:    [[TMP4:%.*]] = mul i64 [[N_VEC]], 8
+; IC2-NEXT:    [[IND_END:%.*]] = getelementptr i8, ptr [[START]], i64 [[TMP4]]
+; IC2-NEXT:    br label %[[VECTOR_BODY:.*]]
+; IC2:       [[VECTOR_BODY]]:
+; IC2-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; IC2-NEXT:    [[OFFSET_IDX:%.*]] = mul i64 [[INDEX]], 8
+; IC2-NEXT:    [[TMP5:%.*]] = add i64 [[OFFSET_IDX]], 0
+; IC2-NEXT:    [[TMP6:%.*]] = add i64 [[OFFSET_IDX]], 32
+; IC2-NEXT:    [[NEXT_GEP:%.*]] = getelementptr i8, ptr [[START]], i64 [[TMP5]]
+; IC2-NEXT:    [[NEXT_GEP3:%.*]] = getelementptr i8, ptr [[START]], i64 [[TMP6]]
+; IC2-NEXT:    [[TMP7:%.*]] = getelementptr i64, ptr [[NEXT_GEP]], i32 0
+; IC2-NEXT:    [[TMP8:%.*]] = getelementptr i64, ptr [[NEXT_GEP]], i32 4
+; IC2-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP7]], align 1
+; IC2-NEXT:    [[WIDE_LOAD4:%.*]] = load <4 x i64>, ptr [[TMP8]], align 1
+; IC2-NEXT:    [[TMP9:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD]], <i64 -12, i64 -12, i64 -12, i64 -12>
+; IC2-NEXT:    [[TMP10:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD4]], <i64 -12, i64 -12, i64 -12, i64 -12>
+; IC2-NEXT:    [[TMP11:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD]], <i64 13, i64 13, i64 13, i64 13>
+; IC2-NEXT:    [[TMP12:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD4]], <i64 13, i64 13, i64 13, i64 13>
+; IC2-NEXT:    [[TMP13:%.*]] = or <4 x i1> [[TMP9]], [[TMP11]]
+; IC2-NEXT:    [[TMP14:%.*]] = or <4 x i1> [[TMP10]], [[TMP12]]
+; IC2-NEXT:    [[TMP15:%.*]] = or <4 x i1> [[TMP13]], [[TMP13]]
+; IC2-NEXT:    [[TMP16:%.*]] = or <4 x i1> [[TMP14]], [[TMP14]]
+; IC2-NEXT:    call void @llvm.masked.store.v4i64.p0(<4 x i64> <i64 42, i64 42, i64 42, i64 42>, ptr [[TMP7]], i32 1, <4 x i1> [[TMP15]])
+; IC2-NEXT:    call void @llvm.masked.store.v4i64.p0(<4 x i64> <i64 42, i64 42, i64 42, i64 42>, ptr [[TMP8]], i32 1, <4 x i1> [[TMP16]])
+; IC2-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
+; IC2-NEXT:    [[TMP17:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; IC2-NEXT:    br i1 [[TMP17]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; IC2:       [[MIDDLE_BLOCK]]:
+; IC2-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[TMP3]], [[N_VEC]]
+; IC2-NEXT:    br i1 [[CMP_N]], label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; IC2:       [[SCALAR_PH]]:
+; IC2-NEXT:    [[BC_RESUME_VAL:%.*]] = phi ptr [ [[IND_END]], %[[MIDDLE_BLOCK]] ], [ [[START]], %[[ENTRY]] ]
 ; IC2-NEXT:    br label %[[LOOP_HEADER:.*]]
 ; IC2:       [[LOOP_HEADER]]:
-; IC2-NEXT:    [[PTR_IV:%.*]] = phi ptr [ [[START]], %[[ENTRY]] ], [ [[PTR_IV_NEXT:%.*]], %[[LOOP_LATCH:.*]] ]
+; IC2-NEXT:    [[PTR_IV:%.*]] = phi ptr [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[PTR_IV_NEXT:%.*]], %[[LOOP_LATCH:.*]] ]
 ; IC2-NEXT:    [[L:%.*]] = load i64, ptr [[PTR_IV]], align 1
 ; IC2-NEXT:    switch i64 [[L]], label %[[LOOP_LATCH]] [
 ; IC2-NEXT:      i64 -12, label %[[IF_THEN:.*]]
@@ -41,7 +118,7 @@ define void @switch_default_to_latch_common_dest(ptr %start, ptr %end) {
 ; IC2:       [[LOOP_LATCH]]:
 ; IC2-NEXT:    [[PTR_IV_NEXT]] = getelementptr inbounds i64, ptr [[PTR_IV]], i64 1
 ; IC2-NEXT:    [[EC:%.*]] = icmp eq ptr [[PTR_IV_NEXT]], [[END]]
-; IC2-NEXT:    br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP_HEADER]]
+; IC2-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP3:![0-9]+]]
 ; IC2:       [[EXIT]]:
 ; IC2-NEXT:    ret void
 ;
@@ -73,9 +150,48 @@ define void @switch_all_dests_distinct(ptr %start, ptr %end) {
 ; IC1-LABEL: define void @switch_all_dests_distinct(
 ; IC1-SAME: ptr [[START:%.*]], ptr [[END:%.*]]) #[[ATTR0]] {
 ; IC1-NEXT:  [[ENTRY:.*]]:
+; IC1-NEXT:    [[START2:%.*]] = ptrtoint ptr [[START]] to i64
+; IC1-NEXT:    [[END1:%.*]] = ptrtoint ptr [[END]] to i64
+; IC1-NEXT:    [[TMP0:%.*]] = add i64 [[END1]], -8
+; IC1-NEXT:    [[TMP1:%.*]] = sub i64 [[TMP0]], [[START2]]
+; IC1-NEXT:    [[TMP2:%.*]] = lshr i64 [[TMP1]], 3
+; IC1-NEXT:    [[TMP3:%.*]] = add nuw nsw i64 [[TMP2]], 1
+; IC1-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP3]], 4
+; IC1-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; IC1:       [[VECTOR_PH]]:
+; IC1-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[TMP3]], 4
+; IC1-NEXT:    [[N_VEC:%.*]] = sub i64 [[TMP3]], [[N_MOD_VF]]
+; IC1-NEXT:    [[TMP4:%.*]] = mul i64 [[N_VEC]], 8
+; IC1-NEXT:    [[IND_END:%.*]] = getelementptr i8, ptr [[START]], i64 [[TMP4]]
+; IC1-NEXT:    br label %[[VECTOR_BODY:.*]]
+; IC1:       [[VECTOR_BODY]]:
+; IC1-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; IC1-NEXT:    [[OFFSET_IDX:%.*]] = mul i64 [[INDEX]], 8
+; IC1-NEXT:    [[TMP5:%.*]] = add i64 [[OFFSET_IDX]], 0
+; IC1-NEXT:    [[NEXT_GEP:%.*]] = getelementptr i8, ptr [[START]], i64 [[TMP5]]
+; IC1-NEXT:    [[TMP6:%.*]] = getelementptr i64, ptr [[NEXT_GEP]], i32 0
+; IC1-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP6]], align 1
+; IC1-NEXT:    [[TMP7:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD]], zeroinitializer
+; IC1-NEXT:    call void @llvm.masked.store.v4i64.p0(<4 x i64> <i64 1, i64 1, i64 1, i64 1>, ptr [[TMP6]], i32 1, <4 x i1> [[TMP7]])
+; IC1-NEXT:    [[TMP8:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD]], <i64 13, i64 13, i64 13, i64 13>
+; IC1-NEXT:    call void @llvm.masked.store.v4i64.p0(<4 x i64> zeroinitializer, ptr [[TMP6]], i32 1, <4 x i1> [[TMP8]])
+; IC1-NEXT:    [[TMP9:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD]], <i64 -12, i64 -12, i64 -12, i64 -12>
+; IC1-NEXT:    call void @llvm.masked.store.v4i64.p0(<4 x i64> <i64 42, i64 42, i64 42, i64 42>, ptr [[TMP6]], i32 1, <4 x i1> [[TMP9]])
+; IC1-NEXT:    [[TMP10:%.*]] = or <4 x i1> [[TMP9]], [[TMP8]]
+; IC1-NEXT:    [[TMP11:%.*]] = or <4 x i1> [[TMP10]], [[TMP7]]
+; IC1-NEXT:    [[TMP12:%.*]] = xor <4 x i1> [[TMP11]], <i1 true, i1 true, i1 true, i1 true>
+; IC1-NEXT:    call void @llvm.masked.store.v4i64.p0(<4 x i64> <i64 2, i64 2, i64 2, i64 2>, ptr [[TMP6]], i32 1, <4 x i1> [[TMP12]])
+; IC1-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
+; IC1-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; IC1-NEXT:    br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
+; IC1:       [[MIDDLE_BLOCK]]:
+; IC1-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[TMP3]], [[N_VEC]]
+; IC1-NEXT:    br i1 [[CMP_N]], label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; IC1:       [[SCALAR_PH]]:
+; IC1-NEXT:    [[BC_RESUME_VAL:%.*]] = phi ptr [ [[IND_END]], %[[MIDDLE_BLOCK]] ], [ [[START]], %[[ENTRY]] ]
 ; IC1-NEXT:    br label %[[LOOP_HEADER:.*]]
 ; IC1:       [[LOOP_HEADER]]:
-; IC1-NEXT:    [[PTR_IV:%.*]] = phi ptr [ [[START]], %[[ENTRY]] ], [ [[PTR_IV_NEXT:%.*]], %[[LOOP_LATCH:.*]] ]
+; IC1-NEXT:    [[PTR_IV:%.*]] = phi ptr [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[PTR_IV_NEXT:%.*]], %[[LOOP_LATCH:.*]] ]
 ; IC1-NEXT:    [[L:%.*]] = load i64, ptr [[PTR_IV]], align 1
 ; IC1-NEXT:    switch i64 [[L]], label %[[DEFAULT:.*]] [
 ; IC1-NEXT:      i64 -12, label %[[IF_THEN_1:.*]]
@@ -97,16 +213,69 @@ define void @switch_all_dests_distinct(ptr %start, ptr %end) {
 ; IC1:       [[LOOP_LATCH]]:
 ; IC1-NEXT:    [[PTR_IV_NEXT]] = getelementptr inbounds i64, ptr [[PTR_IV]], i64 1
 ; IC1-NEXT:    [[EC:%.*]] = icmp eq ptr [[PTR_IV_NEXT]], [[END]]
-; IC1-NEXT:    br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP_HEADER]]
+; IC1-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP5:![0-9]+]]
 ; IC1:       [[EXIT]]:
 ; IC1-NEXT:    ret void
 ;
 ; IC2-LABEL: define void @switch_all_dests_distinct(
 ; IC2-SAME: ptr [[START:%.*]], ptr [[END:%.*]]) #[[ATTR0]] {
 ; IC2-NEXT:  [[ENTRY:.*]]:
+; IC2-NEXT:    [[START2:%.*]] = ptrtoint ptr [[START]] to i64
+; IC2-NEXT:    [[END1:%.*]] = ptrtoint ptr [[END]] to i64
+; IC2-NEXT:    [[TMP0:%.*]] = add i64 [[END1]], -8
+; IC2-NEXT:    [[TMP1:%.*]] = sub i64 [[TMP0]], [[START2]]
+; IC2-NEXT:    [[TMP2:%.*]] = lshr i64 [[TMP1]], 3
+; IC2-NEXT:    [[TMP3:%.*]] = add nuw nsw i64 [[TMP2]], 1
+; IC2-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP3]], 8
+; IC2-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; IC2:       [[VECTOR_PH]]:
+; IC2-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[TMP3]], 8
+; IC2-NEXT:    [[N_VEC:%.*]] = sub i64 [[TMP3]], [[N_MOD_VF]]
+; IC2-NEXT:    [[TMP4:%.*]] = mul i64 [[N_VEC]], 8
+; IC2-NEXT:    [[IND_END:%.*]] = getelementptr i8, ptr [[START]], i64 [[TMP4]]
+; IC2-NEXT:    br label %[[VECTOR_BODY:.*]]
+; IC2:       [[VECTOR_BODY]]:
+; IC2-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; IC2-NEXT:    [[OFFSET_IDX:%.*]] = mul i64 [[INDEX]], 8
+; IC2-NEXT:    [[TMP5:%.*]] = add i64 [[OFFSET_IDX]], 0
+; IC2-NEXT:    [[TMP6:%.*]] = add i64 [[OFFSET_IDX]], 32
+; IC2-NEXT:    [[NEXT_GEP:%.*]] = getelementptr i8, ptr [[START]], i64 [[TMP5]]
+; IC2-NEXT:    [[NEXT_GEP3:%.*]] = getelementptr i8, ptr [[START]], i64 [[TMP6]]
+; IC2-NEXT:    [[TMP7:%.*]] = getelementptr i64, ptr [[NEXT_GEP]], i32 0
+; IC2-NEXT:    [[TMP8:%.*]] = getelementptr i64, ptr [[NEXT_GEP]], i32 4
+; IC2-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP7]], align 1
+; IC2-NEXT:    [[WIDE_LOAD4:%.*]] = load <4 x i64>, ptr [[TMP8]], align 1
+; IC2-NEXT:    [[TMP9:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD]], zeroinitializer
+; IC2-NEXT:    [[TMP10:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD4]], zeroinitializer
+; IC2-NEXT:    call void @llvm.masked.store.v4i64.p0(<4 x i64> <i64 1, i64 1, i64 1, i64 1>, ptr [[TMP7]], i32 1, <4 x i1> [[TMP9]])
+; IC2-NEXT:    call void @llvm.masked.store.v4i64.p0(<4 x i64> <i64 1, i64 1, i64 1, i64 1>, ptr [[TMP8]], i32 1, <4 x i1> [[TMP10]])
+; IC2-NEXT:    [[TMP11:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD]], <i64 13, i64 13, i64 13, i64 13>
+; IC2-NEXT:    [[TMP12:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD4]], <i64 13, i64 13, i64 13, i64 13>
+; IC2-NEXT:    call void @llvm.masked.store.v4i64.p0(<4 x i64> zeroinitializer, ptr [[TMP7]], i32 1, <4 x i1> [[TMP11]])
+; IC2-NEXT:    call void @llvm.masked.store.v4i64.p0(<4 x i64> zeroinitializer, ptr [[TMP8]], i32 1, <4 x i1> [[TMP12]])
+; IC2-NEXT:    [[TMP13:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD]], <i64 -12, i64 -12, i64 -12, i64 -12>
+; IC2-NEXT:    [[TMP14:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD4]], <i64 -12, i64 -12, i64 -12, i64 -12>
+; IC2-NEXT:    call void @llvm.masked.store.v4i64.p0(<4 x i64> <i64 42, i64 42, i64 42, i64 42>, ptr [[TMP7]], i32 1, <4 x i1> [[TMP13]])
+; IC2-NEXT:    call void @llvm.masked.store.v4i64.p0(<4 x i64> <i64 42, i64 42, i64 42, i64 42>, ptr [[TMP8]], i32 1, <4 x i1> [[TMP14]])
+; IC2-NEXT:    [[TMP15:%.*]] = or <4 x i1> [[TMP13]], [[TMP11]]
+; IC2-NEXT:    [[TMP16:%.*]] = or <4 x i1> [[TMP14]], [[TMP12]]
+; IC2-NEXT:    [[TMP17:%.*]] = or <4 x i1> [[TMP15]], [[TMP9]]
+; IC2-NEXT:    [[TMP18:%.*]] = or <4 x i1> [[TMP16]], [[TMP10]]
+; IC2-NEXT:    [[TMP19:%.*]] = xor <4 x i1> [[TMP17]], <i1 true, i1 true, i1 true, i1 true>
+; IC2-NEXT:    [[TMP20:%.*]] = xor <4 x i1> [[TMP18]], <i1 true, i1 true, i1 true, i1 true>
+; IC2-NEXT:    call void @llvm.masked.store.v4i64.p0(<4 x i64> <i64 2, i64 2, i64 2, i64 2>, ptr [[TMP7]], i32 1, <4 x i1> [[TMP19]])
+; IC2-NEXT:    call void @llvm.masked.store.v4i64.p0(<4 x i64> <i64 2, i64 2, i64 2, i64 2>, ptr [[TMP8]], i32 1, <4 x i1> [[TMP20]])
+; IC2-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
+; IC2-NEXT:    [[TMP21:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; IC2-NEXT:    br i1 [[TMP21]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
+; IC2:       [[MIDDLE_BLOCK]]:
+; IC2-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[TMP3]], [[N_VEC]]
+; IC2-NEXT:    br i1 [[CMP_N]], label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; IC2:       [[SCALAR_PH]]:
+; IC2-NEXT:    [[BC_RES...
[truncated]

ayalz

Great to see that switch statements can be vectorized by only teaching createEdgeMask() how to mask them! This conceptually replaces a switch statement with a series of non-cascading conditional branches - one per unique successor, along with if-converting them. In the long run each of these steps may be modelled separately in VPlan.
Also great to see the tests pre-committed! Have yet to review them next.

ayalz · 2024-07-24T13:33:07Z

clang/test/Frontend/optimization-remark-analysis.c

@@ -1,7 +1,7 @@
 // RUN: %clang -O1 -fvectorize -target x86_64-unknown-unknown -emit-llvm -Rpass-analysis -S %s -o - 2>&1 | FileCheck %s --check-prefix=RPASS
 // RUN: %clang -O1 -fvectorize -target x86_64-unknown-unknown -emit-llvm -S %s -o - 2>&1 | FileCheck %s

-// RPASS: {{.*}}:12:5: remark: loop not vectorized: loop contains a switch statement
+// RPASS-NOT: {{.*}}:12:5: remark: loop not vectorized
 // CHECK-NOT: remark: loop not vectorized: loop contains a switch statement


This is already CHECK-NOT'd before the patch?

I think that's to check that the message doesn't get emitted when -Rpass-analysis isn't passed. The function in the test likely needs to be replaced to preserve checking the plumbing for Rpass-analysis.

Test adjusted to use a non-vectorizable call: 4b3bc46

Changes here are gone

ayalz · 2024-07-24T13:37:28Z

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

@@ -1348,11 +1348,11 @@ bool LoopVectorizationLegality::canVectorizeWithIfConvert() {
  // Collect the blocks that need predication.
  for (BasicBlock *BB : TheLoop->blocks()) {
    // We don't support switch statements inside loops.


Suggested change

// We don't support switch statements inside loops.

// We support only branches and switch statements as terminators inside the loop.

Updated, thanks!

ayalz · 2024-07-24T13:55:29Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+      if (IsDefault) {
+        if (C.getCaseSuccessor() == Dst)
+          continue;
+      } else if (C.getCaseSuccessor() != Dst)


Suggested change

if (IsDefault) {

if (C.getCaseSuccessor() == Dst)

continue;

} else if (C.getCaseSuccessor() != Dst)

bool IsCaseSuccessor = C.getCaseSuccessor() == Dst;

if ((!IsDefault && !IsCaseSuccessor) || (IsDefault && IsCaseSuccessor))

continue;

(can fold into if (IsDefault == IsCaseSuccessor) if seems better)

Folded, thanks!

ayalz · 2024-07-24T14:11:36Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    VPValue *Mask = nullptr;
+    VPValue *Cond = getVPValueOrAddLiveIn(SI->getCondition(), Plan);
+    bool IsDefault = SI->getDefaultDest() == Dst;
+    for (auto &C : SI->cases()) {


Note that the number of cases N may be large. All N cases are traversed here per successor, potentially in O(N^2). This potential compile-time concern could be addressed by building a reverse mapping from destinations to conditions, or creating all edge masks from Src to all its successors together.
There's also a potential cost/profitability concern: as N increases the performance advantage of vectorizing diminishes, as it applies if-conversion with a mask per successor, while the original scalar version may use table lookups or cascading conditional branches. Worth bounding the number of cases to something "reasonable", and/or checking that CM considers the associated cost "accurately"?

Rewritten to compute all masks once, thanks

Also added cost estimate

ayalz · 2024-07-24T14:23:44Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    // where destination == Dst using a logical OR.
+    // 2. Dst is the default destination. Dst is reached if none of the cases
+    // with destination != Dst are taken. Join the conditions for each case
+    // where the destination is != Dst using a logical OR and negate it.


Cases whose destinations are the same as default are redundant and can be ignored/eliminated - they will get there anyhow. Perhaps slightly clearer to filter them as such during the traversal.

They are filtered out during the loop below I think.

ayalz · 2024-07-24T15:28:38Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    VPValue *Cond = getVPValueOrAddLiveIn(SI->getCondition(), Plan);
+    bool IsDefault = SI->getDefaultDest() == Dst;


This is fine. Worth considering the alternative of computing the default case by recursively computing (and recording, or retrieving) the masks of all other edges from Src to its successors, excluding default. Note that there may be no such other edges, in which case the desired edge mask is that of Src (worth a test).
When computing the non-default case, all relevant edges need to be computed - the cache was checked at the beginning.

Suggested change

VPValue *Cond = getVPValueOrAddLiveIn(SI->getCondition(), Plan);

bool IsDefault = SI->getDefaultDest() == Dst;

if (SI->getDefaultDest() == Dst) {

VPValue *AllOtherEdgesMask = nullptr;

SmallPtrSet<BasicBlock *, 4> UniqueSuccessors(succ_begin(Src), succ_end(Src));

for (auto *Successor : UniqueSuccessors) {

if (Succssor == Dst)

continue;

VPValue *OtherEdgeMask = createEdgeMask(Src, Successor);

if (AllOtherEdgesMask)

AllOtherEdgesMask = Builder.createLogicalOr(AllOtherEdgesMask, OtherEdgeMask);

else

AllOtherEdgesMask = OtherEdgeMask;

}

if (!AllOtherEdgesMask) {

assert(UniqueSuccessors.size() == 1 && "Src expected to have only default successor");

return EdgeMaskCache[Edge] = SrcMask;

}

return EdgeMaskCache[Edge] = Builder.createNot(AllOtherEdgesMask);

}

VPValue *EdgeMask = nullptr;

VPValue *Cond = getVPValueOrAddLiveIn(SI->getCondition(), Plan);

for (auto &C : SI->cases()) {

if (C.getCaseSuccessor() != Dst)

continue;

// Edge for case C is not in EdgeMaskCache, create it.

VPValue *V = getVPValueOrAddLiveIn(C.getCaseValue(), Plan);

VPValue *CMask = Builder.createICmp(CmpInst::ICMP_EQ, Cond, V);

if (EdgeMask)

EdgeMask = Builder.createLogicalOr(EdgeMask, CMask);

else

EdgeMask = CMask;

}

assert(EdgeMask && "mask must be created");

return EdgeMaskCache[Edge] = EdgeMask;

}

Rewritten to compute all masks once, thanks

ayalz · 2024-07-24T15:29:02Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+        Eq = Builder.createICmp(CmpInst::ICMP_EQ, Cond, V);
+      }
+      if (Mask)
+        Mask = Builder.createOr(Mask, Eq);


Suggested change

Mask = Builder.createOr(Mask, Eq);

Mask = Builder.createLogicalOr(Mask, Eq);

I think a logical OR shouldn't be needed; the only case where the created compares can be poison is when the switch condition is poison; in that case all conditions will be poison and there's no difference between bitwise or logical OR.

Worth a comment?
The commit message should also be updated to read OR rather than logical OR, to be consistent.

ayalz · 2024-07-24T15:30:00Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+      else
+        Mask = Eq;
+    }
+    if (IsDefault)


Mask may be null in this case, avoid Not'ing it if so.

ayalz · 2024-07-24T15:33:11Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -7763,6 +7763,41 @@ VPValue *VPRecipeBuilder::createEdgeMask(BasicBlock *Src, BasicBlock *Dst) {

  VPValue *SrcMask = getBlockInMask(Src);

+  if (auto *SI = dyn_cast<SwitchInst>(Src->getTerminator())) {


Worth asserting that SI stays in the same loop iteration, rather than breaking or continuing to its header? E.g., that !OrigLoop->isLoopExiting(Src).

Added assert here and check to LoopVectorizationLegality, thanks!

ayalz · 2024-07-24T15:51:20Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+      } else if (C.getCaseSuccessor() != Dst)
+        continue;
+
+      VPValue *Eq = EdgeMaskCache.lookup({Src, C.getCaseSuccessor()});


This may succeed only in the IsDefault case, and may lead to OR'ing multiple instances of the same mask - if the same (non-default) successor is the destination of multiple cases. Should be easily folded later though.

Yes that is currently happening for some test cases, but should be cleaned up by VPlan-to-VPlan recipe simplification (follow-up needed)

nikolaypanchenko

Add tests for vplan printing ?

nikolaypanchenko · 2024-07-31T16:49:36Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+  if (auto *SI = dyn_cast<SwitchInst>(Src->getTerminator())) {
+    // Create mask where the terminator in Src is a switch. We need to handle 2
+    // separate cases:
+    // 1. Dst is not the default desintation. Dst is reached if any of the cases


desintation -> destination

Fixed, thanks!

nikolaypanchenko · 2024-07-31T17:00:15Z

llvm/test/Transforms/LoopVectorize/X86/predicate-switch.ll

+; IC1-NEXT:    [[TMP7:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD]], <i64 -12, i64 -12, i64 -12, i64 -12>
+; IC1-NEXT:    [[TMP8:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD]], <i64 13, i64 13, i64 13, i64 13>
+; IC1-NEXT:    [[TMP9:%.*]] = or <4 x i1> [[TMP7]], [[TMP8]]
+; IC1-NEXT:    [[TMP10:%.*]] = or <4 x i1> [[TMP9]], [[TMP9]]


Good catch!

This stems from having parallel {pred, succ} edges in the CFG for multiple switch cases that share a common succ destination, coupled with caching a single edge mask per {pred, succ} pair of VPBB's.

The in mask of a VPBB is created by ORing the edge masks from its predecessors - suffice to visit each predecessor once.

Yes, could be done by de-duplicating predecessors (or VPlan based simplification)

Case should also apply to conditional branches whose two destinations are the same - a case optimized by createEdgeMask() - but not by its user createBlockInMask().

Extra tests for #99808, including cost model tests.

ayalz · 2024-08-01T19:48:51Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    // collect compares for all cases once.
+    VPValue *Cond = getVPValueOrAddLiveIn(SI->getCondition(), Plan);
+    BasicBlock *DefaultDst = SI->getDefaultDest();
+    MapVector<BasicBlock *, SmallVector<VPValue *>> Map;


Suggested change

MapVector<BasicBlock *, SmallVector<VPValue *>> Map;

MapVector<BasicBlock *, SmallVector<VPValue *>> Destination2Compares;

or some shorter yet meaningful name.

Updated to Dst2Compares, thanks!

ayalz · 2024-08-01T19:52:12Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    // 1. Dst is not the default destination. Dst is reached if any of the cases
+    // with destination == Dst are taken. Join the conditions for each case
+    // where destination == Dst using a logical OR.
+    for (const auto &[Dst, Conds] : Map) {


Dst is also a parameter to the method. Better outline this part which computes masks for all edges of a switch to a separate createSwitchEdgeMasks() method?

outlined, thanks!

ayalz · 2024-08-01T19:53:21Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    // We need to handle 2 separate cases:
+    // 1. Dst is not the default destination. Dst is reached if any of the cases
+    // with destination == Dst are taken. Join the conditions for each case
+    // where destination == Dst using a logical OR.


Suggested change

// where destination == Dst using a logical OR.

// whose destination == Dst using an OR.

updated, thanks!

ayalz · 2024-08-01T19:54:24Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+
+    // 2. Dst is the default destination. Dst is reached if none of the cases
+    // with destination != Dst are taken. Join the conditions for each case
+    // where the destination is != Dst using a logical OR and negate it.


Suggested change

// where the destination is != Dst using a logical OR and negate it.

// where the destination is != Dst using an OR and negate it.

Thanks, updated

ayalz · 2024-08-01T20:04:02Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    BasicBlock *DefaultDst = SI->getDefaultDest();
+    MapVector<BasicBlock *, SmallVector<VPValue *>> Map;
+    for (auto &C : SI->cases()) {
+      auto I = Map.insert({C.getCaseSuccessor(), {}});


Suggested change

auto I = Map.insert({C.getCaseSuccessor(), {}});

// Cases whose destination is the same as default are redundant and can be ignored - they will get there anyhow.

if (C.getCaseSuccessor() == DefaultDst)

continue;

auto I = Map.insert({C.getCaseSuccessor(), {}});

This one should also be updated

ayalz · 2024-08-01T20:10:04Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+        Mask = Builder.createOr(Mask, V);
+      if (SrcMask)
+        Mask = Builder.createLogicalAnd(SrcMask, Mask);
+      EdgeMaskCache[{Src, Dst}] = Mask;


Suggested change

EdgeMaskCache[{Src, Dst}] = Mask;

EdgeMaskCache[{Src, Dst}] = Mask;

DefaultMask = !DefaultMask ? Mask : Builder.createOr(DefaultMask, Mask);

can also collect DefaultMask in this loop, to be finalized and stored after it.

Updated, thanks!

ayalz · 2024-08-01T20:21:53Z

llvm/test/Transforms/LoopVectorize/X86/predicate-switch.ll

+; IC1-NEXT:    [[TMP7:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD]], <i64 -12, i64 -12, i64 -12, i64 -12>
+; IC1-NEXT:    [[TMP8:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD]], <i64 13, i64 13, i64 13, i64 13>
+; IC1-NEXT:    [[TMP9:%.*]] = or <4 x i1> [[TMP7]], [[TMP8]]
+; IC1-NEXT:    [[TMP10:%.*]] = or <4 x i1> [[TMP9]], [[TMP9]]


Case should also apply to conditional branches whose two destinations are the same - a case optimized by createEdgeMask() - but not by its user createBlockInMask().

ayalz · 2024-08-01T20:24:01Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+        Eq = Builder.createICmp(CmpInst::ICMP_EQ, Cond, V);
+      }
+      if (Mask)
+        Mask = Builder.createOr(Mask, Eq);


Worth a comment?
The commit message should also be updated to read OR rather than logical OR, to be consistent.

Support for vectorizing switch statements will be added in #99808. Update the loop to use a call that cannot be vectorized to preserve testing surfacing analysis remarks via the frontend.

ayalz

This looks good to me, thanks for accommodating! Left behind several comments.

ayalz · 2024-08-05T15:36:00Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -6456,6 +6456,17 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I,
    // a predicated block since it will become a fall-through, although we
    // may decide in the future to call TTI for all branches.
  }
+  case Instruction::Switch: {
+    if (VF.isScalar())
+      return TTI.getCFInstrCost(Instruction::Switch, CostKind);


The above scalar cost seems right, wonder about the vector cost below - the cost associated with predicating conditional branches is collected when visiting each phi, rather than the branch itself. May be good to calibrate with some tests, can leave behind a TODO to be done separately.

The vector code matches the cost of the generated masks, which h are costed explicitly for the version with branches due to the compares being explicit instructions. Currently it seems more like the scalar cost may be estimated by getCFInstrCost, but that probably would need to be fixed in TTI.

ayalz · 2024-08-07T20:05:01Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -7839,6 +7850,60 @@ VPRecipeBuilder::mapToVPValues(User::op_range Operands) {
  return map_range(Operands, Fn);
 }

+void VPRecipeBuilder::createSwitchEdgeMasks(SwitchInst *SI) {
+  BasicBlock *Src = SI->getParent();


Suggested change

BasicBlock *Src = SI->getParent();

BasicBlock *Src = SI->getParent();

assert(!EdgeMaskCache.contains(Src) && "Edge masks already created");

?

Added but moved down to iterating over all cases, as we need to look up (Src, Dst) pairs, thanks!

ayalz · 2024-08-07T20:11:29Z

llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h

+  /// Create masks for all cases with destination different than the default
+  /// destination, and a mask for the default destination.


Suggested change

/// Create masks for all cases with destination different than the default

/// destination, and a mask for the default destination.

/// Create an edge mask for every destination of cases and/or default.

Done, thanks!

ayalz · 2024-08-08T16:13:58Z

llvm/test/Transforms/LoopVectorize/X86/predicate-switch.ll

+; COST-NEXT:    [[TMP7:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD]], <i64 -12, i64 -12, i64 -12, i64 -12>
+; COST-NEXT:    [[TMP8:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD]], <i64 13, i64 13, i64 13, i64 13>
+; COST-NEXT:    [[TMP9:%.*]] = or <4 x i1> [[TMP7]], [[TMP8]]
+; COST-NEXT:    [[TMP10:%.*]] = or <4 x i1> [[TMP9]], [[TMP9]]


It may be good to have alongside a version doing the same with a conditional branch, i.e., "if (%1 == -12) || (%1 == 13) *ptr_iv = 42". (But avoid getting them folded into a switch as in the last test at the bottom.) Could be useful for calibrating the associated costs, in addition to comparing the generated codes.

Added a variation, decision is the same.

ayalz · 2024-08-08T16:19:16Z

llvm/test/Transforms/LoopVectorize/X86/predicate-switch.ll

@@ -104,9 +181,62 @@ define void @switch_all_dests_distinct(ptr %start, ptr %end) {
 ; FORCED-LABEL: define void @switch_all_dests_distinct(


Sanity check: is it indeed unprofitable to vectorize this case having four distinct destinations, say by VF=4 and UF=1, as decided by COST?

I checked and the reason for not vectorizing is that getCFInstrCost considers Switch as free on X86.

ayalz · 2024-08-08T17:00:49Z

llvm/test/Transforms/LoopVectorize/predicate-switch.ll

@@ -162,21 +343,21 @@ define void @switch_to_header(ptr %start) {
 ; IC1-NEXT:  [[ENTRY:.*]]:
 ; IC1-NEXT:    br label %[[LOOP_HEADER:.*]]
 ; IC1:       [[LOOP_HEADER]]:
-; IC1-NEXT:    [[IV:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ [[IV_NEXT:%.*]], %[[LOOP_HEADER_BACKEDGE:.*]] ]
+; IC1-NEXT:    [[IV:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ [[IV_NEXT:%.*]], %[[IF_THEN1:.*]] ]


If/how are these changes in block labels (and those below) related to this patch?

Re-generated check lines separately, changes here should be gone now 5286656

ayalz · 2024-08-08T17:02:50Z

llvm/test/Transforms/LoopVectorize/predicate-switch.ll

+; IC1-NEXT:    [[TMP0:%.*]] = add i64 [[INDEX]], 0
+; IC1-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i64, ptr [[START]], i64 [[TMP0]]
+; IC1-NEXT:    [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[TMP1]], i32 0
+; IC1-NEXT:    store <2 x i64> <i64 42, i64 42>, ptr [[TMP2]], align 1


ayalz · 2024-08-08T17:04:22Z

llvm/test/Transforms/LoopVectorize/predicate-switch.ll

+; IC2-NEXT:    [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[START]], i64 [[TMP0]]
+; IC2-NEXT:    [[TMP3:%.*]] = getelementptr inbounds i64, ptr [[START]], i64 [[TMP1]]
+; IC2-NEXT:    [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[TMP2]], i32 0
+; IC2-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[TMP2]], i32 2


Is TMP3 (and TMP1) dead?

Yes, this is due to the fact that VPVectorPointerRecipe is responsible to generate pointers for all parts, using only the first part of the pointe operand, but needs marking as only-uses-first-part. Will do separately, thanks!

Partly fixed in 5a42a67, but this case would also require adjusting VPReplicateRecipe::execute to only generate a single part if only the first part is used.

Might be better to move forward with #95842, which I just rebased, will address comments soon!

ayalz · 2024-08-08T17:13:13Z

llvm/test/Transforms/PhaseOrdering/X86/pr48844-br-to-switch-vectorization.ll

@@ -11,21 +11,71 @@ define dso_local void @test(ptr %start, ptr %end) #0 {
 ; CHECK-LABEL: @test(


Time to replace above FIXME by a comment that under -O2 adjacent branches get fused into a switch statement before vectorization.

Dropped, thanks!

ayalz · 2024-08-08T17:18:05Z

llvm/test/Transforms/PhaseOrdering/X86/pr48844-br-to-switch-vectorization.ll

+; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <8 x i32>, ptr [[NEXT_GEP]], align 4
+; CHECK-NEXT:    [[WIDE_LOAD8:%.*]] = load <8 x i32>, ptr [[TMP5]], align 4
+; CHECK-NEXT:    [[WIDE_LOAD9:%.*]] = load <8 x i32>, ptr [[TMP6]], align 4
+; CHECK-NEXT:    [[WIDE_LOAD10:%.*]] = load <8 x i32>, ptr [[TMP7]], align 4


Sanity check: is the decision to vectorize by VF=8, UF=4 taken by COST according to cost estimates involving a two-case switch, profitable? I.e., comparable to the decision of a conditional branch (w/o being folded into a switch).

Yep the devisions agree

Regenerate check lines for test to avoid unrelated changes in #99808.

Update createEdgeMask to created masks where the terminator in Src is a switch. We need to handle 2 separate cases: 1. Dst is not the default desintation. Dst is reached if any of the cases with destination == Dst are taken. Join the conditions for each case where destination == Dst using a logical OR. 2. Dst is the default destination. Dst is reached if none of the cases with destination != Dst are taken. Join the conditions for each case where the destination is != Dst using a logical OR and negate it. Fixes llvm#48188.

llvm-ci · 2024-08-11T19:32:05Z

LLVM Buildbot has detected a new failure on builder bolt-x86_64-ubuntu-clang running on bolt-worker while building llvm at step 6 "test-build-clang-bolt-stage2-clang-bolt".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/113/builds/2099

Here is the relevant piece of the build log for the reference:

Step 6 (test-build-clang-bolt-stage2-clang-bolt) failure: test (failure)
...
726.610 [9/6/2999] Linking CXX static library lib/libclangStaticAnalyzerCore.a
726.783 [8/6/3000] Linking CXX static library lib/libclangStaticAnalyzerCheckers.a
727.676 [8/5/3001] Building CXX object tools/clang/tools/driver/CMakeFiles/clang.dir/cc1as_main.cpp.o
728.716 [8/4/3002] Building CXX object tools/clang/tools/driver/CMakeFiles/clang.dir/driver.cpp.o
729.181 [8/3/3003] Building CXX object tools/clang/lib/StaticAnalyzer/Frontend/CMakeFiles/obj.clangStaticAnalyzerFrontend.dir/AnalysisConsumer.cpp.o
729.204 [7/3/3004] Linking CXX static library lib/libclangStaticAnalyzerFrontend.a
729.215 [6/3/3005] Linking CXX static library lib/libclangFrontendTool.a
732.028 [6/2/3006] Building CXX object tools/clang/tools/driver/CMakeFiles/clang.dir/cc1_main.cpp.o
771.446 [5/2/3007] Linking CXX executable bin/llvm-bolt
821.543 [5/1/3008] Linking CXX executable bin/clang-20
FAILED: bin/clang-20 
: && /home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/./bin/clang++ -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -fprofile-instr-use="/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/tools/clang/stage2-instrumented-bins/tools/clang/utils/perf-training/clang.profdata" -flto=thin -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O3 -DNDEBUG -Wl,--emit-relocs,-znow -fuse-ld=lld -Wl,--color-diagnostics -fprofile-instr-use="/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/tools/clang/stage2-instrumented-bins/tools/clang/utils/perf-training/clang.profdata" -flto=thin -Wl,--thinlto-cache-dir=/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/lto.cache   -Wl,--export-dynamic tools/clang/tools/driver/CMakeFiles/clang.dir/driver.cpp.o tools/clang/tools/driver/CMakeFiles/clang.dir/cc1_main.cpp.o tools/clang/tools/driver/CMakeFiles/clang.dir/cc1as_main.cpp.o tools/clang/tools/driver/CMakeFiles/clang.dir/cc1gen_reproducer_main.cpp.o tools/clang/tools/driver/CMakeFiles/clang.dir/clang-driver.cpp.o -o bin/clang-20  -Wl,-rpath,"\$ORIGIN/../lib:"  lib/libLLVMX86CodeGen.a  lib/libLLVMX86AsmParser.a  lib/libLLVMX86Desc.a  lib/libLLVMX86Disassembler.a  lib/libLLVMX86Info.a  lib/libLLVMAnalysis.a  lib/libLLVMCodeGen.a  lib/libLLVMCore.a  lib/libLLVMipo.a  lib/libLLVMAggressiveInstCombine.a  lib/libLLVMInstCombine.a  lib/libLLVMInstrumentation.a  lib/libLLVMMC.a  lib/libLLVMMCParser.a  lib/libLLVMObjCARCOpts.a  lib/libLLVMOption.a  lib/libLLVMScalarOpts.a  lib/libLLVMSupport.a  lib/libLLVMTargetParser.a  lib/libLLVMTransformUtils.a  lib/libLLVMVectorize.a  lib/libclangBasic.a  lib/libclangCodeGen.a  lib/libclangDriver.a  lib/libclangFrontend.a  lib/libclangFrontendTool.a  lib/libclangSerialization.a  lib/libLLVMAsmPrinter.a  lib/libLLVMGlobalISel.a  lib/libLLVMSelectionDAG.a  lib/libLLVMMCDisassembler.a  lib/libclangCodeGen.a  lib/libLLVMCoverage.a  lib/libLLVMFrontendDriver.a  lib/libLLVMLTO.a  lib/libLLVMExtensions.a  lib/libLLVMPasses.a  lib/libLLVMCodeGen.a  lib/libLLVMCodeGenTypes.a  lib/libLLVMObjCARCOpts.a  lib/libLLVMCFGuard.a  lib/libLLVMIRPrinter.a  lib/libLLVMTarget.a  lib/libLLVMCoroutines.a  lib/libLLVMipo.a  lib/libLLVMInstrumentation.a  lib/libLLVMVectorize.a  lib/libLLVMBitWriter.a  lib/libLLVMLinker.a  lib/libLLVMHipStdPar.a  lib/libclangExtractAPI.a  lib/libclangInstallAPI.a  lib/libLLVMTextAPIBinaryReader.a  lib/libclangRewriteFrontend.a  lib/libclangARCMigrate.a  lib/libclangStaticAnalyzerFrontend.a  lib/libclangStaticAnalyzerCheckers.a  lib/libclangStaticAnalyzerCore.a  lib/libclangCrossTU.a  lib/libclangIndex.a  lib/libclangFrontend.a  lib/libclangDriver.a  lib/libLLVMWindowsDriver.a  lib/libLLVMOption.a  lib/libclangParse.a  lib/libclangSerialization.a  lib/libclangSema.a  lib/libclangAnalysis.a  lib/libclangASTMatchers.a  lib/libLLVMFrontendHLSL.a  lib/libclangAPINotes.a  lib/libclangEdit.a  lib/libclangSupport.a  lib/libclangAST.a  lib/libclangFormat.a  lib/libclangToolingInclusions.a  lib/libclangToolingCore.a  lib/libclangRewrite.a  lib/libclangLex.a  lib/libclangBasic.a  lib/libLLVMFrontendOpenMP.a  lib/libLLVMScalarOpts.a  lib/libLLVMAggressiveInstCombine.a  lib/libLLVMInstCombine.a  lib/libLLVMFrontendOffloading.a  lib/libLLVMTransformUtils.a  lib/libLLVMAnalysis.a  lib/libLLVMProfileData.a  lib/libLLVMSymbolize.a  lib/libLLVMDebugInfoDWARF.a  lib/libLLVMDebugInfoPDB.a  lib/libLLVMDebugInfoMSF.a  lib/libLLVMDebugInfoBTF.a  lib/libLLVMObject.a  lib/libLLVMMCParser.a  lib/libLLVMMC.a  lib/libLLVMDebugInfoCodeView.a  lib/libLLVMIRReader.a  lib/libLLVMBitReader.a  lib/libLLVMAsmParser.a  lib/libLLVMTextAPI.a  lib/libLLVMCore.a  lib/libLLVMRemarks.a  lib/libLLVMBitstreamReader.a  lib/libLLVMBinaryFormat.a  lib/libLLVMTargetParser.a  lib/libLLVMSupport.a  lib/libLLVMDemangle.a  -lrt  -ldl  -lm  /usr/lib/x86_64-linux-gnu/libz.so && :
instruction should have been considered by earlier checks
UNREACHABLE executed at /home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3361!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Running pass "function<eager-inv>(float2int,lower-constant-intrinsics,chr,loop(loop-rotate<header-duplication;no-prepare-for-lto>,loop-deletion),loop-distribute,inject-tli-mappings,loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>,infer-alignment,loop-load-elim,instcombine<max-iterations=1;no-use-loop-info;no-verify-fixpoint>,simplifycfg<bonus-inst-threshold=1;forward-switch-cond;switch-range-to-icmp;switch-to-lookup;no-keep-loops;hoist-common-insts;sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,slp-vectorizer,vector-combine,instcombine<max-iterations=1;no-use-loop-info;no-verify-fixpoint>,loop-unroll<O3>,transform-warning,sroa<preserve-cfg>,infer-alignment,instcombine<max-iterations=1;no-use-loop-info;no-verify-fixpoint>,loop-mssa(licm<allowspeculation>),alignment-from-assumptions,loop-sink,instsimplify,div-rem-pairs,tailcallelim,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts;speculate-blocks;simplify-cond-branch;speculate-unpredictables>)" on module "lib/libLLVMCoroutines.a(CoroFrame.cpp.o at 402394)"
1.	Running pass "loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>" on function "_ZL13solveTypeNamePN4llvm4TypeE"
LLVM ERROR: Failed to rename temporary file /home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/lto.cache/Thin-34f422.tmp.o to /home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/lto.cache/llvmcache-568F5DB9F016D4F2157F15CC13205EE586C27228: No such file or directory

clang++: error: unable to execute command: Aborted (core dumped)
clang++: error: linker command failed due to signal (use -v to see invocation)
ninja: build stopped: subcommand failed.
FAILED: tools/clang/stage2-stamps/stage2-clang-bolt /home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-stamps/stage2-clang-bolt 
cd /home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/tools/clang/stage2-instrumented-bins/tools/clang && /usr/bin/cmake --build /home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/ --target clang-bolt
ninja: build stopped: subcommand failed.
FAILED: tools/clang/stage2-instrumented-stamps/stage2-instrumented-stage2-clang-bolt /home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/tools/clang/stage2-instrumented-stamps/stage2-instrumented-stage2-clang-bolt 
cd /home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/tools/clang && /usr/bin/cmake --build /home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/tools/clang/stage2-instrumented-bins/ --target stage2-clang-bolt
ninja: build stopped: subcommand failed.

fhahn · 2024-08-11T19:59:04Z

The buildbot failure above was due to not considering SwitchInst in ::isPredicatedInst. Should be fixed in 60680f7

VPVectorPointerRecipe only uses the first part of the pointer operand, so mark it accordingly. Follow-up suggested as part of #99808.

fhahn requested review from rengolin, ayalz and aniragil July 21, 2024 11:17

llvmbot added clang Clang issues not falling into any other category vectorizers llvm:transforms labels Jul 21, 2024

ayalz reviewed Jul 24, 2024

View reviewed changes

fhahn mentioned this pull request Jul 25, 2024

Missed loop replace_if vectorization due to the presence of switch #48188

Closed

hiraditya mentioned this pull request Jul 26, 2024

Increase number of icmps needed for converting to a switch statement #99269

Closed

nikolaypanchenko reviewed Jul 31, 2024

View reviewed changes

fhahn added a commit that referenced this pull request Aug 1, 2024

[LV] Add more tests with switches.

8557035

Extra tests for #99808, including cost model tests.

fhahn force-pushed the lv-switch branch from 19d0aec to cb4461c Compare August 1, 2024 19:10

ayalz reviewed Aug 1, 2024

View reviewed changes

fhahn force-pushed the lv-switch branch from cb4461c to c0abd1b Compare August 4, 2024 10:55

fhahn force-pushed the lv-switch branch from c0abd1b to b34d919 Compare August 5, 2024 20:26

ayalz approved these changes Aug 8, 2024

View reviewed changes

fhahn added a commit that referenced this pull request Aug 11, 2024

[LV] Regenerate check lines in preparation for #99808.

5286656

Regenerate check lines for test to avoid unrelated changes in #99808.

fhahn added 4 commits August 11, 2024 15:09

!fixup address comments, thanks!

65d14c8

!fixup address latest comments, thanks!

379ce90

!fixup address latest comments, thanks!

c8c323b

fhahn force-pushed the lv-switch branch from b34d919 to c8c323b Compare August 11, 2024 14:24

fhahn merged commit f0df4fb into llvm:main Aug 11, 2024
6 of 8 checks passed

fhahn deleted the lv-switch branch August 11, 2024 19:58

fhahn added a commit that referenced this pull request Aug 12, 2024

[VPlan] Mark VPVectorPointer as only using the first part of the ptr.

5a42a67

VPVectorPointerRecipe only uses the first part of the pointer operand, so mark it accordingly. Follow-up suggested as part of #99808.

hiraditya mentioned this pull request Aug 27, 2024

[libc++] Vectorize all the algorithms #84663

Open

26 tasks

	// We don't support switch statements inside loops.
	// We support only branches and switch statements as terminators inside the loop.

		VPValue *Cond = getVPValueOrAddLiveIn(SI->getCondition(), Plan);
		bool IsDefault = SI->getDefaultDest() == Dst;

-    VPValue *Cond = getVPValueOrAddLiveIn(SI->getCondition(), Plan);
-    bool IsDefault = SI->getDefaultDest() == Dst;
+    if (SI->getDefaultDest() == Dst) {
+      VPValue *AllOtherEdgesMask = nullptr;
+      SmallPtrSet<BasicBlock *, 4> UniqueSuccessors(succ_begin(Src), succ_end(Src));
+      for (auto *Successor : UniqueSuccessors) {
+        if (Succssor == Dst)
+          continue;
+        VPValue *OtherEdgeMask = createEdgeMask(Src, Successor);
+        if (AllOtherEdgesMask)
+          AllOtherEdgesMask = Builder.createLogicalOr(AllOtherEdgesMask, OtherEdgeMask);
+        else
+          AllOtherEdgesMask = OtherEdgeMask;
+      }
+      if (!AllOtherEdgesMask) {
+        assert(UniqueSuccessors.size() == 1 && "Src expected to have only default successor");
+        return EdgeMaskCache[Edge] = SrcMask;
+      }
+      return EdgeMaskCache[Edge] = Builder.createNot(AllOtherEdgesMask);
+    }
+    VPValue *EdgeMask = nullptr;
+    VPValue *Cond = getVPValueOrAddLiveIn(SI->getCondition(), Plan);
+    for (auto &C : SI->cases()) {
+      if (C.getCaseSuccessor() != Dst)
+        continue;
+      // Edge for case C is not in EdgeMaskCache, create it.
+      VPValue *V = getVPValueOrAddLiveIn(C.getCaseValue(), Plan);
+      VPValue *CMask = Builder.createICmp(CmpInst::ICMP_EQ, Cond, V);
+      if (EdgeMask)
+        EdgeMask = Builder.createLogicalOr(EdgeMask, CMask);
+      else
+        EdgeMask = CMask;
+    }
+    assert(EdgeMask && "mask must be created");
+    return EdgeMaskCache[Edge] = EdgeMask;
+  }

	Mask = Builder.createOr(Mask, Eq);
	Mask = Builder.createLogicalOr(Mask, Eq);

		@@ -7763,6 +7763,41 @@ VPValue VPRecipeBuilder::createEdgeMask(BasicBlock Src, BasicBlock *Dst) {

		VPValue *SrcMask = getBlockInMask(Src);

		if (auto *SI = dyn_cast<SwitchInst>(Src->getTerminator())) {

	MapVector<BasicBlock , SmallVector<VPValue >> Map;
	MapVector<BasicBlock , SmallVector<VPValue >> Destination2Compares;

	// where destination == Dst using a logical OR.
	// whose destination == Dst using an OR.

	// where the destination is != Dst using a logical OR and negate it.
	// where the destination is != Dst using an OR and negate it.

	EdgeMaskCache[{Src, Dst}] = Mask;
	EdgeMaskCache[{Src, Dst}] = Mask;
	DefaultMask = !DefaultMask ? Mask : Builder.createOr(DefaultMask, Mask);

	BasicBlock *Src = SI->getParent();
	BasicBlock *Src = SI->getParent();
	assert(!EdgeMaskCache.contains(Src) && "Edge masks already created");

		/// Create masks for all cases with destination different than the default
		/// destination, and a mask for the default destination.

	/// Create masks for all cases with destination different than the default
	/// destination, and a mask for the default destination.
	/// Create an edge mask for every destination of cases and/or default.

		@@ -104,9 +181,62 @@ define void @switch_all_dests_distinct(ptr %start, ptr %end) {
		; FORCED-LABEL: define void @switch_all_dests_distinct(

		@@ -11,21 +11,71 @@ define dso_local void @test(ptr %start, ptr %end) #0 {
		; CHECK-LABEL: @test(

[LV] Support generating masks for switch terminators. #99808

[LV] Support generating masks for switch terminators. #99808

Uh oh!

Conversation

fhahn commented Jul 21, 2024

Uh oh!

llvmbot commented Jul 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ayalz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikolaypanchenko left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

llvmbot commented Jul 21, 2024 •

edited

Loading