[VPlan] Use pointer to member 0 as VPInterleaveRecipe's pointer arg. #106431

fhahn · 2024-08-28T18:37:20Z

Update VPInterleaveRecipe to always use the pointer to member 0 as pointer argument. This in many cases helps to remove unneeded index adjustments and simplifies VPInterleaveRecipe::execute.

In some rare cases, the address of member 0 does not dominate the insert position of the interleave group. In those cases a PtrAdd VPInstruction is emitted to compute the address of member 0 based on the address of the insert position. Alternatively we could hoist the recipe computing the address of member 0.

Update VPInterleaveRecipe to always use the pointer to member 0 as pointer argument. This in many cases helps to remove unneeded index adjustments and simplifies VPInterleaveRecipe::execute. In some rare cases, the address of member 0 does not dominate the insert position of the interleave group. In those cases a PtrAdd VPInstruction is emitted to compute the address of member 0 based on the address of the insert position. Alternatively we could hoist the recipe computing the address of member 0.

llvmbot · 2024-08-28T18:37:53Z

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Update VPInterleaveRecipe to always use the pointer to member 0 as pointer argument. This in many cases helps to remove unneeded index adjustments and simplifies VPInterleaveRecipe::execute.

In some rare cases, the address of member 0 does not dominate the insert position of the interleave group. In those cases a PtrAdd VPInstruction is emitted to compute the address of member 0 based on the address of the insert position. Alternatively we could hoist the recipe computing the address of member 0.

Patch is 32.76 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/106431.diff

10 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+2-2)
(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+7)
(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+7-5)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+42-8)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+3-1)
(modified) llvm/test/Transforms/LoopVectorize/X86/interleave-cost.ll (+7-10)
(modified) llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll (+6-12)
(modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses-different-insert-position.ll (+5-3)
(modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll (+9-16)
(modified) llvm/test/Transforms/LoopVectorize/vplan-printing.ll (+2-3)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index cb104c4ed2d03d..1e92bda5494248 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -8993,8 +8993,8 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
   // Interleave memory: for each Interleave Group we marked earlier as relevant
   // for this VPlan, replace the Recipes widening its memory instructions with a
   // single VPInterleaveRecipe at its insertion point.
-  VPlanTransforms::createInterleaveGroups(InterleaveGroups, RecipeBuilder,
-                                          CM.isScalarEpilogueAllowed());
+  VPlanTransforms::createInterleaveGroups(
+      *Plan, InterleaveGroups, RecipeBuilder, CM.isScalarEpilogueAllowed());
 
   for (ElementCount VF : Range)
     Plan->addVF(VF);
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index e7ea5cb23b90d3..b08cba798a9a9b 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -1330,6 +1330,13 @@ class VPInstruction : public VPRecipeWithIRFlags {
     assert(Opcode == Instruction::Or && "only OR opcodes can be disjoint");
   }
 
+  VPInstruction(VPValue *Ptr, VPValue *Offset, bool InBounds, DebugLoc DL = {},
+                const Twine &Name = "")
+      : VPRecipeWithIRFlags(VPDef::VPInstructionSC,
+                            ArrayRef<VPValue *>({Ptr, Offset}),
+                            GEPFlagsTy(InBounds), DL),
+        Opcode(VPInstruction::PtrAdd), Name(Name.str()) {}
+
   VPInstruction(unsigned Opcode, std::initializer_list<VPValue *> Operands,
                 FastMathFlags FMFs, DebugLoc DL = {}, const Twine &Name = "");
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index f84317ba51257a..df4e3a54812626 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -646,7 +646,9 @@ Value *VPInstruction::generatePerPart(VPTransformState &State, unsigned Part) {
            "can only generate first lane for PtrAdd");
     Value *Ptr = State.get(getOperand(0), Part, /* IsScalar */ true);
     Value *Addend = State.get(getOperand(1), Part, /* IsScalar */ true);
-    return Builder.CreatePtrAdd(Ptr, Addend, Name);
+    return Builder.CreatePtrAdd(Ptr, Addend, Name,
+                                isInBounds() ? GEPNoWrapFlags::inBounds()
+                                             : GEPNoWrapFlags::none());
   }
   case VPInstruction::ResumePhi: {
     if (Part != 0)
@@ -2393,7 +2395,6 @@ void VPInterleaveRecipe::execute(VPTransformState &State) {
 
   // Prepare for the new pointers.
   SmallVector<Value *, 2> AddrParts;
-  unsigned Index = Group->getIndex(Instr);
 
   // TODO: extend the masked interleaved-group support to reversed access.
   VPValue *BlockInMask = getMask();
@@ -2413,10 +2414,11 @@ void VPInterleaveRecipe::execute(VPTransformState &State) {
     Idx = State.Builder.CreateSub(RuntimeVF, State.Builder.getInt32(1));
     Idx = State.Builder.CreateMul(Idx,
                                   State.Builder.getInt32(Group->getFactor()));
-    Idx = State.Builder.CreateAdd(Idx, State.Builder.getInt32(Index));
     Idx = State.Builder.CreateNeg(Idx);
-  } else
-    Idx = State.Builder.getInt32(-Index);
+  } else {
+    // TODO: Drop redundant 0-index GEP as follow-up.
+    Idx = State.Builder.getInt32(0);
+  }
 
   VPValue *Addr = getAddr();
   for (unsigned Part = 0; Part < State.UF; Part++) {
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index ee7c7cea0b7670..4dae6a54b017f6 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -596,8 +596,7 @@ static void legalizeAndOptimizeInductions(VPlan &Plan, ScalarEvolution &SE) {
           Plan, InductionDescriptor::IK_IntInduction, Instruction::Add, nullptr,
           SE, nullptr, StartV, StepV, InsertPt);
 
-      auto *Recipe = new VPInstruction(VPInstruction::PtrAdd,
-                                       {PtrIV->getStartValue(), Steps},
+      auto *Recipe = new VPInstruction(PtrIV->getStartValue(), Steps, false,
                                        PtrIV->getDebugLoc(), "next.gep");
 
       Recipe->insertAfter(Steps);
@@ -1522,14 +1521,19 @@ void VPlanTransforms::dropPoisonGeneratingRecipes(
 }
 
 void VPlanTransforms::createInterleaveGroups(
-    const SmallPtrSetImpl<const InterleaveGroup<Instruction> *> &InterleaveGroups,
+    VPlan &Plan,
+    const SmallPtrSetImpl<const InterleaveGroup<Instruction> *>
+        &InterleaveGroups,
     VPRecipeBuilder &RecipeBuilder, bool ScalarEpilogueAllowed) {
+  if (InterleaveGroups.empty())
+    return;
+
   // Interleave memory: for each Interleave Group we marked earlier as relevant
   // for this VPlan, replace the Recipes widening its memory instructions with a
   // single VPInterleaveRecipe at its insertion point.
+  VPDominatorTree VPDT;
+  VPDT.recalculate(Plan);
   for (const auto *IG : InterleaveGroups) {
-    auto *Recipe =
-        cast<VPWidenMemoryRecipe>(RecipeBuilder.getRecipe(IG->getInsertPos()));
     SmallVector<VPValue *, 4> StoredValues;
     for (unsigned i = 0; i < IG->getFactor(); ++i)
       if (auto *SI = dyn_cast_or_null<StoreInst>(IG->getMember(i))) {
@@ -1539,9 +1543,39 @@ void VPlanTransforms::createInterleaveGroups(
 
     bool NeedsMaskForGaps =
         IG->requiresScalarEpilogue() && !ScalarEpilogueAllowed;
-    auto *VPIG = new VPInterleaveRecipe(IG, Recipe->getAddr(), StoredValues,
-                                        Recipe->getMask(), NeedsMaskForGaps);
-    VPIG->insertBefore(Recipe);
+
+    Instruction *IRInsertPos = IG->getInsertPos();
+    auto *InsertPos =
+        cast<VPWidenMemoryRecipe>(RecipeBuilder.getRecipe(IRInsertPos));
+    VPRecipeBase *IP = InsertPos;
+
+    // Get or create the start address for the interleave group.
+    auto *Start =
+        cast<VPWidenMemoryRecipe>(RecipeBuilder.getRecipe(IG->getMember(0)));
+    VPValue *Addr = Start->getAddr();
+    if (!VPDT.properlyDominates(Addr->getDefiningRecipe(), InsertPos)) {
+      bool InBounds = false;
+      if (auto *gep = dyn_cast<GetElementPtrInst>(
+              getLoadStorePointerOperand(IRInsertPos)->stripPointerCasts()))
+        InBounds = gep->isInBounds();
+
+      // We cannot re-use the address of the first member because it does not
+      // dominate the insert position. Use the address of the insert position
+      // and create a PtrAdd to adjust the index to start at the first member.
+      APInt Offset(32,
+                   getLoadStoreType(IRInsertPos)->getScalarSizeInBits() / 8 *
+                       IG->getIndex(IRInsertPos),
+                   /*IsSigned=*/true);
+      VPValue *OffsetVPV = Plan.getOrAddLiveIn(
+          ConstantInt::get(IRInsertPos->getParent()->getContext(), -Offset));
+      Addr = new VPInstruction(InsertPos->getAddr(), OffsetVPV, InBounds);
+      Addr->getDefiningRecipe()->insertAfter(InsertPos);
+      IP = Addr->getDefiningRecipe();
+    }
+    auto *VPIG = new VPInterleaveRecipe(IG, Addr, StoredValues,
+                                        InsertPos->getMask(), NeedsMaskForGaps);
+    VPIG->insertAfter(IP);
+
     unsigned J = 0;
     for (unsigned i = 0; i < IG->getFactor(); ++i)
       if (Instruction *Member = IG->getMember(i)) {
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
index e714f69eeff1ab..434545c5ea67e1 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
@@ -112,7 +112,9 @@ struct VPlanTransforms {
   // widening its memory instructions with a single VPInterleaveRecipe at its
   // insertion point.
   static void createInterleaveGroups(
-      const SmallPtrSetImpl<const InterleaveGroup<Instruction> *> &InterleaveGroups,
+      VPlan &Plan,
+      const SmallPtrSetImpl<const InterleaveGroup<Instruction> *>
+          &InterleaveGroups,
       VPRecipeBuilder &RecipeBuilder, bool ScalarEpilogueAllowed);
 };
 
diff --git a/llvm/test/Transforms/LoopVectorize/X86/interleave-cost.ll b/llvm/test/Transforms/LoopVectorize/X86/interleave-cost.ll
index cc1d11754b27ed..9383799b181c82 100644
--- a/llvm/test/Transforms/LoopVectorize/X86/interleave-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/X86/interleave-cost.ll
@@ -89,12 +89,11 @@ define void @test_free_instructions_feeding_geps_for_interleave_groups(ptr noali
 ; CHECK-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x float> poison, float [[TMP40]], i64 0
 ; CHECK-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x float> [[BROADCAST_SPLATINSERT]], <2 x float> poison, <2 x i32> zeroinitializer
 ; CHECK-NEXT:    [[TMP41:%.*]] = shl i64 [[TMP39]], 2
+; CHECK-NEXT:    [[TMP44:%.*]] = getelementptr float, ptr [[DST_1]], i64 [[TMP41]]
 ; CHECK-NEXT:    [[TMP42:%.*]] = load float, ptr [[P_INVAR]], align 4
 ; CHECK-NEXT:    [[BROADCAST_SPLATINSERT27:%.*]] = insertelement <2 x float> poison, float [[TMP42]], i64 0
 ; CHECK-NEXT:    [[BROADCAST_SPLAT28:%.*]] = shufflevector <2 x float> [[BROADCAST_SPLATINSERT27]], <2 x float> poison, <2 x i32> zeroinitializer
-; CHECK-NEXT:    [[TMP43:%.*]] = or disjoint i64 [[TMP41]], 3
-; CHECK-NEXT:    [[TMP44:%.*]] = getelementptr float, ptr [[DST_1]], i64 [[TMP43]]
-; CHECK-NEXT:    [[TMP45:%.*]] = getelementptr float, ptr [[TMP44]], i32 -3
+; CHECK-NEXT:    [[TMP45:%.*]] = getelementptr float, ptr [[TMP44]], i32 0
 ; CHECK-NEXT:    [[TMP46:%.*]] = shufflevector <2 x float> [[BROADCAST_SPLAT]], <2 x float> [[BROADCAST_SPLAT28]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
 ; CHECK-NEXT:    [[TMP47:%.*]] = shufflevector <4 x float> [[TMP46]], <4 x float> zeroinitializer, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
 ; CHECK-NEXT:    [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x float> [[TMP47]], <8 x float> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 1, i32 3, i32 5, i32 7>
@@ -102,8 +101,8 @@ define void @test_free_instructions_feeding_geps_for_interleave_groups(ptr noali
 ; CHECK-NEXT:    [[TMP48:%.*]] = load float, ptr [[P_INVAR]], align 4
 ; CHECK-NEXT:    [[BROADCAST_SPLATINSERT29:%.*]] = insertelement <2 x float> poison, float [[TMP48]], i64 0
 ; CHECK-NEXT:    [[BROADCAST_SPLAT30:%.*]] = shufflevector <2 x float> [[BROADCAST_SPLATINSERT29]], <2 x float> poison, <2 x i32> zeroinitializer
-; CHECK-NEXT:    [[TMP49:%.*]] = getelementptr float, ptr [[DST_2]], i64 [[TMP43]]
-; CHECK-NEXT:    [[TMP50:%.*]] = getelementptr float, ptr [[TMP49]], i32 -3
+; CHECK-NEXT:    [[TMP49:%.*]] = getelementptr float, ptr [[DST_2]], i64 [[TMP41]]
+; CHECK-NEXT:    [[TMP50:%.*]] = getelementptr float, ptr [[TMP49]], i32 0
 ; CHECK-NEXT:    [[BROADCAST_SPLAT36:%.*]] = shufflevector <2 x float> [[BROADCAST_SPLAT30]], <2 x float> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
 ; CHECK-NEXT:    [[TMP51:%.*]] = shufflevector <4 x float> [[BROADCAST_SPLAT36]], <4 x float> zeroinitializer, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
 ; CHECK-NEXT:    [[INTERLEAVED_VEC31:%.*]] = shufflevector <8 x float> [[TMP51]], <8 x float> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 1, i32 3, i32 5, i32 7>
@@ -266,8 +265,7 @@ define void @geps_feeding_interleave_groups_with_reuse(ptr %arg, i64 %arg1, ptr
 ; CHECK-NEXT:    [[TMP35:%.*]] = fmul <2 x float> [[TMP34]], zeroinitializer
 ; CHECK-NEXT:    [[TMP36:%.*]] = fadd <2 x float> [[STRIDED_VEC16]], [[STRIDED_VEC20]]
 ; CHECK-NEXT:    [[TMP37:%.*]] = fmul <2 x float> [[TMP36]], zeroinitializer
-; CHECK-NEXT:    [[TMP38:%.*]] = getelementptr i8, ptr [[TMP28]], i64 12
-; CHECK-NEXT:    [[TMP39:%.*]] = getelementptr float, ptr [[TMP38]], i32 -3
+; CHECK-NEXT:    [[TMP39:%.*]] = getelementptr float, ptr [[TMP28]], i32 0
 ; CHECK-NEXT:    [[TMP40:%.*]] = shufflevector <2 x float> [[TMP31]], <2 x float> [[TMP33]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
 ; CHECK-NEXT:    [[TMP41:%.*]] = shufflevector <2 x float> [[TMP35]], <2 x float> [[TMP37]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
 ; CHECK-NEXT:    [[TMP42:%.*]] = shufflevector <4 x float> [[TMP40]], <4 x float> [[TMP41]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
@@ -490,11 +488,10 @@ define void @geps_feeding_interleave_groups_with_reuse2(ptr %A, ptr %B, i64 %N)
 ; CHECK-NEXT:    [[WIDE_VEC:%.*]] = load <16 x i32>, ptr [[TMP53]], align 4
 ; CHECK-NEXT:    [[STRIDED_VEC:%.*]] = shufflevector <16 x i32> [[WIDE_VEC]], <16 x i32> poison, <4 x i32> <i32 0, i32 4, i32 8, i32 12>
 ; CHECK-NEXT:    [[STRIDED_VEC34:%.*]] = shufflevector <16 x i32> [[WIDE_VEC]], <16 x i32> poison, <4 x i32> <i32 1, i32 5, i32 9, i32 13>
+; CHECK-NEXT:    [[TMP56:%.*]] = getelementptr i32, ptr [[A]], i64 [[TMP50]]
 ; CHECK-NEXT:    [[TMP54:%.*]] = getelementptr i32, ptr [[B]], <4 x i64> [[VEC_IND]]
 ; CHECK-NEXT:    [[WIDE_MASKED_GATHER:%.*]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0(<4 x ptr> [[TMP54]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> poison), !alias.scope [[META6:![0-9]+]]
-; CHECK-NEXT:    [[TMP55:%.*]] = or disjoint i64 [[TMP50]], 7
-; CHECK-NEXT:    [[TMP56:%.*]] = getelementptr i32, ptr [[A]], i64 [[TMP55]]
-; CHECK-NEXT:    [[TMP57:%.*]] = getelementptr i32, ptr [[TMP56]], i32 -7
+; CHECK-NEXT:    [[TMP57:%.*]] = getelementptr i32, ptr [[TMP56]], i32 0
 ; CHECK-NEXT:    [[TMP58:%.*]] = shufflevector <4 x i32> [[STRIDED_VEC]], <4 x i32> zeroinitializer, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
 ; CHECK-NEXT:    [[TMP59:%.*]] = shufflevector <4 x i32> [[STRIDED_VEC34]], <4 x i32> zeroinitializer, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
 ; CHECK-NEXT:    [[TMP60:%.*]] = shufflevector <4 x i32> [[WIDE_MASKED_GATHER]], <4 x i32> zeroinitializer, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
diff --git a/llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll b/llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll
index 6b52023cfbcaec..968058134690bc 100644
--- a/llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll
+++ b/llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll
@@ -1419,13 +1419,11 @@ define dso_local void @masked_strided2(ptr noalias nocapture readonly %p, ptr no
 ; ENABLED_MASKED_STRIDED-NEXT:    [[WIDE_MASKED_VEC:%.*]] = call <16 x i8> @llvm.masked.load.v16i8.p0(ptr [[TMP2]], i32 1, <16 x i1> [[INTERLEAVED_MASK]], <16 x i8> poison)
 ; ENABLED_MASKED_STRIDED-NEXT:    [[STRIDED_VEC:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
 ; ENABLED_MASKED_STRIDED-NEXT:    [[STRIDED_VEC1:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
-; ENABLED_MASKED_STRIDED-NEXT:    [[TMP3:%.*]] = or disjoint i32 [[TMP1]], 1
 ; ENABLED_MASKED_STRIDED-NEXT:    [[TMP4:%.*]] = call <8 x i8> @llvm.smax.v8i8(<8 x i8> [[STRIDED_VEC]], <8 x i8> [[STRIDED_VEC1]])
+; ENABLED_MASKED_STRIDED-NEXT:    [[TMP6:%.*]] = getelementptr i8, ptr [[Q:%.*]], i32 [[TMP1]]
 ; ENABLED_MASKED_STRIDED-NEXT:    [[TMP5:%.*]] = sub <8 x i8> zeroinitializer, [[TMP4]]
-; ENABLED_MASKED_STRIDED-NEXT:    [[TMP6:%.*]] = getelementptr i8, ptr [[Q:%.*]], i32 [[TMP3]]
-; ENABLED_MASKED_STRIDED-NEXT:    [[TMP7:%.*]] = getelementptr i8, ptr [[TMP6]], i32 -1
 ; ENABLED_MASKED_STRIDED-NEXT:    [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP4]], <8 x i8> [[TMP5]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
-; ENABLED_MASKED_STRIDED-NEXT:    call void @llvm.masked.store.v16i8.p0(<16 x i8> [[INTERLEAVED_VEC]], ptr [[TMP7]], i32 1, <16 x i1> [[INTERLEAVED_MASK]])
+; ENABLED_MASKED_STRIDED-NEXT:    call void @llvm.masked.store.v16i8.p0(<16 x i8> [[INTERLEAVED_VEC]], ptr [[TMP6]], i32 1, <16 x i1> [[INTERLEAVED_MASK]])
 ; ENABLED_MASKED_STRIDED-NEXT:    [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
 ; ENABLED_MASKED_STRIDED-NEXT:    [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>
 ; ENABLED_MASKED_STRIDED-NEXT:    [[TMP8:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1024
@@ -2555,13 +2553,11 @@ define dso_local void @masked_strided2_unknown_tc(ptr noalias nocapture readonly
 ; ENABLED_MASKED_STRIDED-NEXT:    [[WIDE_MASKED_VEC:%.*]] = call <16 x i8> @llvm.masked.load.v16i8.p0(ptr [[TMP3]], i32 1, <16 x i1> [[INTERLEAVED_MASK]], <16 x i8> poison)
 ; ENABLED_MASKED_STRIDED-NEXT:    [[STRIDED_VEC:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
 ; ENABLED_MASKED_STRIDED-NEXT:    [[STRIDED_VEC3:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
-; ENABLED_MASKED_STRIDED-NEXT:    [[TMP5:%.*]] = or disjoint i32 [[TMP2]], 1
 ; ENABLED_MASKED_STRIDED-NEXT:    [[TMP6:%.*]] = call <8 x i8> @llvm.smax.v8i8(<8 x i8> [[STRIDED_VEC]], <8 x i8> [[STRIDED_VEC3]])
+; ENABLED_MASKED_STRIDED-NEXT:    [[TMP8:%.*]] = getelementptr i8, ptr [[Q:%.*]], i32 [[TMP2]]
 ; ENABLED_MASKED_STRIDED-NEXT:    [[TMP7:%.*]] = sub <8 x i8> zeroinitializer, [[TMP6]]
-; ENABLED_MASKED_STRIDED-NEXT:    [[TMP8:%.*]] = getelementptr i8, ptr [[Q:%.*]], i32 [[TMP5]]
-; ENABLED_MASKED_STRIDED-NEXT:    [[TMP9:%.*]] = getelementptr i8, ptr [[TMP8]], i32 -1
 ; ENABLED_MASKED_STRIDED-NEXT:    [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP6]], <8 x i8> [[TMP7]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
-; ENABLED_MASKED_STRIDED-NEXT:    call void @llvm.masked.store.v16i8.p0(<16 x i8> [[INTERLEAVED_VEC]], ptr [[TMP9]], i32 1, <16 x i1> [[INTERLEAVED_MASK]])
+; ENABLED_MASKED_STRIDED-NEXT:    call void @llvm.masked.store.v16i8.p0(<16 x i8> [[INTERLEAVED_VEC]], ptr [[TMP8]], i32 1, <16 x i1> [[INTERLEAVED_MASK]])
 ; ENABLED_MASKED_STRIDED-NEXT:    [[INDEX_NEXT]] = add i32 [[INDEX]], 8
 ; ENABLED_MASKED_STRIDED-NEXT:    [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>
 ; ENABLED_MASKED_STRIDED-NEXT:    [[TMP10:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
@@ -2989,13 +2985,11 @@ define dso_local void @unconditional_masked_strided2_unknown_tc(ptr noalias noca
 ; ENABLED_MASKED_STRIDED-NEXT:    [[WIDE_MASKED_VEC:%.*]] = call <16 x i8> @llvm.masked.load.v16i8.p0(ptr [[TMP2]], i32 1, <16 x i1> [[INTERLEAVED_MASK]], <16 x i8> poison)
 ; ENABLED_MASKED_STRIDED-NEXT:    [[STRIDED_VEC:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
 ; ENABLED_MASKED_STRIDED-NEXT:    [[STRIDED_VEC3:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
-; ENABLED_MASKED_STRIDED-NEXT:    [[TMP3:%.*]] = or disjoint i32 [[TMP1]], 1
 ; ENABLED_MASKED_STRIDED-NEXT:    [[TMP4:%.*]] = call <8 x i8> @llvm.smax.v8i8(<8 x i8> [[STRIDED_VEC]], <8 x i8> [[STRIDED_VEC3]])
+; ENABLED_MASKED_STRIDED-NEXT:    [[TMP6:%.*]] = getelementptr inbounds i8, ptr [[Q:%.*]], i32 [[TMP1]]
 ; ENABLED_MASKED_STRIDED-NEXT:    [[TMP5:%.*]] = sub <8 x i8> zeroinitializer, [[TMP4]]
-; ENABLED_MASKED_STRIDED-NEXT:    [[TMP6:%.*]] = getelementptr inbounds i8, ptr [[Q:%.*]], i32 [[TMP3]]
-; ENABLED_MASKED_STRIDED-NEXT:    [[TMP7:%.*]] = getelementptr inbounds i8, ptr [[TMP6]], i32 -1
 ; ENABLED_MASKED_STRIDED-NEXT:    [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP4]], <8 x i8> [[TMP5]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
-; ENABLED_MASKED_STRIDED-NEXT:    call void @llvm.masked.store.v16i8.p0(<16 x i8> [[INTERLEAVED_VEC]], ptr [[TMP7]], i32 1, <16 x i1> [[INTERLEAVED_MASK]])
+; ENABLED_MASKED_STRIDED-NEXT:    call void @llvm.masked.store.v16i8.p0(<16 x i8> [[INTERLEAVED_VEC]], ptr [[TMP6]], i32 1, <16 x i1> [[...
[truncated]

This patch adds a new narrowInterleaveGroups transfrom, which tries convert a plan with interleave groups with VF elements to a plan that instead replaces the interleave groups with wide loads and stores processing VF elements. This effectively is a very simple form of loop-aware SLP, where we use interleave groups to identify candidates. This initial version is quite restricted and hopefully serves as a starting point for how to best model those kinds of transforms. Depends on llvm#106431. Fixes llvm#82936

fhahn · 2024-09-12T08:33:45Z

ping :)

artagnon

The change makes sense to me, thanks!

llvm/lib/Transforms/Vectorize/VPlan.h

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

ayalz

Nice simplification! Can provide a step towards facilitating SLP.

ayalz · 2024-09-23T10:32:10Z

llvm/lib/Transforms/Vectorize/VPlan.h

@@ -1330,6 +1330,13 @@ class VPInstruction : public VPRecipeWithIRFlags {
    assert(Opcode == Instruction::Or && "only OR opcodes can be disjoint");
  }

+  VPInstruction(VPValue *Ptr, VPValue *Offset, bool InBounds, DebugLoc DL = {},


nit: a bit odd to construct PtrAdd given two VPValues and a bool. Perhaps pass a GEPFlagsTy as third parameter instead of bool, analogous to passing FMFs?

Updated, thanks!

ayalz · 2024-09-23T10:35:12Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

+                                isInBounds() ? GEPNoWrapFlags::inBounds()
+                                             : GEPNoWrapFlags::none());


Independent of this patch, but exercised only with it?

ayalz · 2024-09-23T10:40:34Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

+    return Builder.CreatePtrAdd(Ptr, Addend, Name,
+                                isInBounds() ? GEPNoWrapFlags::inBounds()
+                                             : GEPNoWrapFlags::none());


Suggested change

return Builder.CreatePtrAdd(Ptr, Addend, Name,

isInBounds() ? GEPNoWrapFlags::inBounds()

: GEPNoWrapFlags::none());

auto Flags = isInBounds() ? GEPNoWrapFlags::inBounds()

: GEPNoWrapFlags::none());

return Builder.CreatePtrAdd(Ptr, Addend, Name, Flags);

?

Updated to use CreateInBoundsPtrAdd/CreatePtrAdd.

ayalz · 2024-09-23T12:40:40Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

@@ -2413,10 +2414,11 @@ void VPInterleaveRecipe::execute(VPTransformState &State) {
    Idx = State.Builder.CreateSub(RuntimeVF, State.Builder.getInt32(1));
    Idx = State.Builder.CreateMul(Idx,
                                  State.Builder.getInt32(Group->getFactor()));


Does the above comment of adjusting the index from the first vector lane (lane zero?) rather than using the pointer of the last lane, still hold? If so, this adjusting is invariant and better be placed in the preheader, possibly later by licm?

Yes I think so, we just don't have to compute the index for the first lane (used to be done via the removed add)

ayalz · 2024-09-23T12:42:13Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

+  } else {
+    // TODO: Drop redundant 0-index GEP as follow-up.
+    Idx = State.Builder.getInt32(0);
+  }

  VPValue *Addr = getAddr();
  for (unsigned Part = 0; Part < State.UF; Part++) {


Should the "Notice current instruction ..." explanation below be updated?

It can be dropped now I think, done, thanks!

ayalz · 2024-09-23T13:45:37Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+    auto *Start =
+        cast<VPWidenMemoryRecipe>(RecipeBuilder.getRecipe(IG->getMember(0)));
+    VPValue *Addr = Start->getAddr();
+    if (!VPDT.properlyDominates(Addr->getDefiningRecipe(), InsertPos)) {


then hoist Addr's defining recipe to insert pos, possibly hoisting additional defining recipes as needed? And/or sink loads above member zero to join it.

In general, all members of an interleave group conceptually move to its insert pos, so may as well perform actual movement. This should facilitate subsequent SLP'ing.

Interleave stores are inserted at the last member, so the address of any member should be available there. Interleaved loads are inserted at the first member, so only its address is guaranteed to be available there, although others may as well.

Sounds good, should be done as potential follow-up?

Sure, worth leaving behind a TODO.

Added, thanks!

ayalz · 2024-09-23T13:54:55Z

llvm/test/Transforms/LoopVectorize/interleaved-accesses-different-insert-position.ll

+; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr i8, ptr [[TMP4]], i32 -4
+; CHECK-NEXT:    [[TMP51:%.*]] = getelementptr i16, ptr [[TMP5]], i32 0


Note: here a redundant gep with zero is added, replacing TMP4[-2:i16] with TMP4[-4:i8 + 0]

ayalz · 2024-09-23T15:37:08Z

llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll

 ; CHECK-NEXT:    [[INTERLEAVED_VEC:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLAT]], <4 x i32> [[BROADCAST_SPLAT2]], <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
-; CHECK-NEXT:    store <8 x i32> [[INTERLEAVED_VEC]], ptr [[TMP12]], align 4
+; CHECK-NEXT:    store <8 x i32> [[INTERLEAVED_VEC]], ptr [[TMP7]], align 4


Note: this stores INTERLEAVED_VEC at A[OFFSET_IDX], rather than to A[OFFSET_IDX+(1*4)-4]
Stores of X should be dce'd.

ayalz · 2024-09-23T16:06:00Z

llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll

 ; ENABLED_MASKED_STRIDED-NEXT:    [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP4]], <8 x i8> [[TMP5]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
-; ENABLED_MASKED_STRIDED-NEXT:    call void @llvm.masked.store.v16i8.p0(<16 x i8> [[INTERLEAVED_VEC]], ptr [[TMP7]], i32 1, <16 x i1> [[INTERLEAVED_MASK]])
+; ENABLED_MASKED_STRIDED-NEXT:    call void @llvm.masked.store.v16i8.p0(<16 x i8> [[INTERLEAVED_VEC]], ptr [[TMP6]], i32 1, <16 x i1> [[INTERLEAVED_MASK]])


Note: Q[(TMP1+1)-1] --> Q[TMP1]

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

…t-address

…t-address-tmp

Preparation for #106431.

…t-address-tmp

This patch adds a new narrowInterleaveGroups transfrom, which tries convert a plan with interleave groups with VF elements to a plan that instead replaces the interleave groups with wide loads and stores processing VF elements. This effectively is a very simple form of loop-aware SLP, where we use interleave groups to identify candidates. This initial version is quite restricted and hopefully serves as a starting point for how to best model those kinds of transforms. Depends on llvm#106431. Fixes llvm#82936

ayalz

LGTM, adding a few minor comments.

ayalz · 2024-10-05T16:39:16Z

llvm/lib/Transforms/Vectorize/VPlan.h

@@ -1323,6 +1322,13 @@ class VPInstruction : public VPRecipeWithIRFlags,
    assert(Opcode == Instruction::Or && "only OR opcodes can be disjoint");
  }

+  VPInstruction(VPValue *Ptr, VPValue *Offset, GEPFlagsTy Flags = {false},


Suggested change

VPInstruction(VPValue *Ptr, VPValue *Offset, GEPFlagsTy Flags = {false},

VPInstruction(VPValue *Ptr, VPValue *Offset, GEPFlagsTy Flags,

default unneeded and may lead to potential ambiguation?

Done, thanks!

ayalz · 2024-10-05T16:39:47Z

llvm/lib/Transforms/Vectorize/VPlan.h

+                DebugLoc DL = {}, const Twine &Name = "")
+      : VPRecipeWithIRFlags(VPDef::VPInstructionSC,
+                            ArrayRef<VPValue *>({Ptr, Offset}),
+                            GEPFlagsTy(Flags), DL),


Suggested change

GEPFlagsTy(Flags), DL),

Flags, DL),

?

Done, thanks!

ayalz · 2024-10-05T16:42:46Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

-  } else
-    Idx = State.Builder.getInt32(-Index);
+  } else {
+    // TODO: Drop redundant 0-index GEP as follow-up.


Your call; generating gep with zero also causes some test discrepancies.

ayalz · 2024-10-05T20:21:26Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+    auto *Start =
+        cast<VPWidenMemoryRecipe>(RecipeBuilder.getRecipe(IG->getMember(0)));
+    VPValue *Addr = Start->getAddr();
+    if (!VPDT.properlyDominates(Addr->getDefiningRecipe(), InsertPos)) {


Also handle(s) a live-in Addr?

Done by checking if there's a defining recipe, thanks!

ayalz · 2024-10-05T20:22:38Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+      // and create a PtrAdd to adjust the index to start at the first member.
+      APInt Offset(32,
+                   getLoadStoreType(IRInsertPos)->getScalarSizeInBits() / 8 *
+                       IG->getIndex(IRInsertPos),


Worth asserting Offset or index of IRInsertPos is non zero?
I.e., if insert pos is the first member, its address operand must dominate it.

Added assert, thanks!

ayalz · 2024-10-05T20:30:27Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+              getLoadStorePointerOperand(IRInsertPos)->stripPointerCasts()))
+        InBounds = Gep->isInBounds();
+
+      // We cannot re-use the address of the first member because it does not


Suggested change

// We cannot re-use the address of the first member because it does not

// We cannot re-use the address of member zero because it does not

(may be confusing: insertion point appears "first" (for loads, last for stores); here we mean "first" in terms of memory addresses.)

Updated, thanks!

ayalz · 2024-10-05T20:30:55Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+        InBounds = Gep->isInBounds();
+
+      // We cannot re-use the address of the first member because it does not
+      // dominate the insert position. Use the address of the insert position


Suggested change

// dominate the insert position. Use the address of the insert position

// dominate the insert position. Instead, use the address of the insert position

Updated thanks.

ayalz · 2024-10-05T20:32:23Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+
+      // We cannot re-use the address of the first member because it does not
+      // dominate the insert position. Use the address of the insert position
+      // and create a PtrAdd to adjust the index to start at the first member.


Suggested change

// and create a PtrAdd to adjust the index to start at the first member.

// and create a PtrAdd adjusting it to the address of member zero.

Updated, thanks!

ayalz · 2024-10-05T20:36:18Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+    auto *Start =
+        cast<VPWidenMemoryRecipe>(RecipeBuilder.getRecipe(IG->getMember(0)));
+    VPValue *Addr = Start->getAddr();
+    if (!VPDT.properlyDominates(Addr->getDefiningRecipe(), InsertPos)) {


Sure, worth leaving behind a TODO.

…t-address

* commit 'FETCH_HEAD': [X86] getIntImmCostInst - pull out repeated Imm.getBitWidth() calls. NFC. [X86] Add test coverage for llvm#111323 [Driver] Use empty multilib file in another test (llvm#111352) [clang][OpenMP][test] Use x86_64-linux-gnu triple for test referencing avx512f feature (llvm#111337) [doc] Fix Kaleidoscope tutorial chapter 3 code snippet and full listing discrepancies (llvm#111289) [Flang][OpenMP] Improve entry block argument creation and binding (llvm#110267) [x86] combineMul - handle 0/-1 KnownBits cases before MUL_IMM logic (REAPPLIED) [llvm-dis] Fix non-deterministic disassembly across multiple inputs (llvm#110988) [lldb][test] TestDataFormatterLibcxxOptionalSimulator.py: change order of ifdefs [lldb][test] Add libcxx-simulators test for std::optional (llvm#111133) [x86] combineMul - use computeKnownBits directly to find MUL_IMM constant splat. (REAPPLIED) Reland "[lldb][test] TestDataFormatterLibcxxStringSimulator.py: add new padding layout" (llvm#111123) Revert "[x86] combineMul - use computeKnownBits directly to find MUL_IMM constant splat." update_test_checks: fix a simple regression (llvm#111347) [LegalizeVectorTypes] Always widen fabs (llvm#111298) [lsan] Make ReportUnsuspendedThreads return bool also for Fuchsia [mlir][vector] Add more tests for ConvertVectorToLLVM (6/n) (llvm#111121) [bazel] port 9144fed [SystemZ] Remove inlining threshold multiplier. (llvm#106058) [LegalizeVectorTypes] When widening don't check for libcalls if promoted (llvm#111297) [clang][Driver] Improve multilib custom error reporting (llvm#110804) [clang][Driver] Rename "FatalError" key to "Error" in multilib.yaml (llvm#110804) [LLVM][Maintainers] Update release managers (llvm#111164) [Clang][Driver] Add option to provide path for multilib's YAML config file (llvm#109640) [LoopVectorize] Remove redundant code in emitSCEVChecks (llvm#111132) [AMDGPU] Only emit SCOPE_SYS global_wb (llvm#110636) [ELF] Change Ctx::target to unique_ptr (llvm#111260) [ELF] Pass Ctx & to some free functions [RISCV] Only disassemble fcvtmod.w.d if the rounding mode is rtz. (llvm#111308) [Clang] Remove the special-casing for RequiresExprBodyDecl in BuildResolvedCallExpr() after fd87d76 (llvm#111277) [ELF] Pass Ctx & to InputFile [clang-format] Add AlignFunctionDeclarations to AlignConsecutiveDeclarations (llvm#108241) [AMDGPU] Support preloading hidden kernel arguments (llvm#98861) [ELF] Move static nextGroupId isInGroup to LinkerDriver [clangd] Add ArgumentLists config option under Completion (llvm#111322) [ELF] Pass Ctx & to SyntheticSections [ELF] Pass Ctx & to Symbols [ELF] Pass Ctx & to Symbols [ELF] getRelocTargetVA: pass Ctx and Relocation. NFC [clang-tidy] Avoid capturing a local variable in a static lambda in UseRangesCheck (llvm#111282) [VPlan] Use pointer to member 0 as VPInterleaveRecipe's pointer arg. (llvm#106431) [clangd] Simplify ternary expressions with std::optional::value_or (NFC) (llvm#111309) [libc++][format][2/3] Optimizes c-string arguments. (llvm#101805) [RISCV] Combine RVBUnary and RVKUnary into classes that are more similar to ALU(W)_r(r/i). NFC (llvm#111279) [ELF] Pass Ctx & to InputFiles [libc] GPU RPC interface: add return value to `rpc_host_call` (llvm#111288) Signed-off-by: kyvangka1610 <[email protected]>

The GEP with offet 0 is redundant, remove it. This addresses a TODO from 7f74651 ((#106431).

dyung · 2024-10-08T23:45:48Z

Hi @fhahn, we are seeing an assertion failure on an internal test which I bisected back to this change. I have put the details in #111606, can you take a look?

fhahn · 2024-10-09T12:21:42Z

Hi @fhahn, we are seeing an assertion failure on an internal test which I bisected back to this change. I have put the details in #111606, can you take a look?

Thanks @dyung, should be fixed by 01cbbc5

alexfh · 2024-10-15T13:26:05Z

Hi @fhahn, a heads up: some of our tests started failing on aarch64 after this commit. Bisecting with -opt-bisect-limit points to LoopVectorizePass on a certain function. I'm trying to figure out if there's a UB in the original code, but so far I don't see anything suspicious. I'll try to provide a test case, if I don't find problems with the original code.

alexfh · 2024-10-15T16:57:36Z

I looked closer at the original C++ code and it actually looks like a miscompile. The diff in the generated IR before and after this commit (and the follow-up):

--- /tmp/q-good.ll      2024-10-15 18:29:02.491364585 +0200
+++ /tmp/q-bad.ll       2024-10-15 18:28:29.535008374 +0200
@@ -1281,8 +1281,8 @@
   %41 = mul i64 %40, 24
   %42 = getelementptr i8, ptr %4, i64 %41
   %43 = insertelement <2 x i64> <i64 poison, i64 0>, i64 %12, i64 0
-  %44 = getelementptr i8, ptr %4, i64 8
-  %45 = getelementptr i8, ptr %4, i64 56
+  %44 = getelementptr i8, ptr %4, i64 16
+  %45 = getelementptr i8, ptr %4, i64 64
   br label %46

 46:                                               ; preds = %46, %36
@@ -1291,16 +1291,16 @@
   %49 = phi <2 x i64> [ zeroinitializer, %36 ], [ %68, %46 ]
   %50 = mul i64 %47, 24
   %51 = getelementptr i8, ptr %44, i64 %50
-  %52 = load <6 x ptr>, ptr %51, align 8
-  %53 = shufflevector <6 x ptr> %52, <6 x ptr> poison, <2 x i32> <i32 0, i32 3>
-  %54 = shufflevector <6 x ptr> %52, <6 x ptr> poison, <2 x i32> <i32 1, i32 4>
-  %55 = getelementptr i8, ptr %45, i64 %50
-  %56 = load <6 x ptr>, ptr %55, align 8
+  %52 = getelementptr i8, ptr %45, i64 %50
+  %53 = load <6 x ptr>, ptr %51, align 8
+  %54 = shufflevector <6 x ptr> %53, <6 x ptr> poison, <2 x i32> <i32 0, i32 3>
+  %55 = shufflevector <6 x ptr> %53, <6 x ptr> poison, <2 x i32> <i32 1, i32 4>
+  %56 = load <6 x ptr>, ptr %52, align 8
   %57 = shufflevector <6 x ptr> %56, <6 x ptr> poison, <2 x i32> <i32 0, i32 3>
   %58 = shufflevector <6 x ptr> %56, <6 x ptr> poison, <2 x i32> <i32 1, i32 4>
-  %59 = ptrtoint <2 x ptr> %54 to <2 x i64>
+  %59 = ptrtoint <2 x ptr> %55 to <2 x i64>
   %60 = ptrtoint <2 x ptr> %58 to <2 x i64>
-  %61 = ptrtoint <2 x ptr> %53 to <2 x i64>
+  %61 = ptrtoint <2 x ptr> %54 to <2 x i64>
   %62 = ptrtoint <2 x ptr> %57 to <2 x i64>
   %63 = sub <2 x i64> %59, %61
   %64 = sub <2 x i64> %60, %62
@@ -8549,7 +8549,7 @@
 !0 = !{i32 1, !"wchar_size", i32 4}
 !1 = !{i32 8, !"PIC Level", i32 2}
 !2 = !{i32 7, !"frame-pointer", i32 1}
-!3 = !{!"clang version  (00128a20eec27246719d73ba427bf821883b00b4)"}
+!3 = !{!"clang version  (efcfa6e711689ada546c323316145ecd749d380a)"}
 !4 = !{}
 !5 = distinct !{!5, !6}
 !6 = !{!"llvm.loop.mustprogress"}

I've reduced this to

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32"
target triple = "aarch64-unknown-linux-gnu"

define i64 @_f(ptr %0, ptr %1) {
  br label %4

3:                                                ; preds = %4
  ret i64 %14

4:                                                ; preds = %4, %2
  %5 = phi i64 [ %14, %4 ], [ 0, %2 ]
  %6 = phi ptr [ %15, %4 ], [ %1, %2 ]
  %7 = getelementptr i8, ptr %6, i64 16
  %8 = load ptr, ptr %7, align 8
  %9 = ptrtoint ptr %8 to i64
  %10 = getelementptr i8, ptr %6, i64 8
  %11 = load ptr, ptr %10, align 8
  %12 = ptrtoint ptr %11 to i64
  %13 = or i64 %9, %12
  %14 = or i64 %13, %5
  %15 = getelementptr nusw i8, ptr %6, i64 24
  %16 = icmp eq ptr %6, %0
  br i1 %16, label %3, label %4

; uselistorder directives
  uselistorder i64 %14, { 1, 0 }
}

$ diff -u100 <(./clang-good --target=aarch64-linux-gnu -O3 -emit-llvm -S reduced.ll -o -) <(./clang-bad --target=aarch64-linux-gnu -O3 -emit-llvm -S reduced.ll -o -)
--- /dev/fd/63  2024-10-15 18:55:25.820702792 +0200
+++ /dev/fd/62  2024-10-15 18:55:25.824702835 +0200
@@ -1,90 +1,90 @@
 ; ModuleID = 'reduced.ll'
 source_filename = "reduced.ll"
 target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32"
 target triple = "aarch64-unknown-linux-gnu"

 ; Function Attrs: nofree norecurse nosync nounwind memory(argmem: read)
 define i64 @_f(ptr readnone %0, ptr readonly %1) local_unnamed_addr #0 {
   %3 = ptrtoint ptr %1 to i64
   %4 = ptrtoint ptr %0 to i64
   %5 = sub i64 %4, %3
   %min.iters.check = icmp ult i64 %5, 96
   br i1 %min.iters.check, label %scalar.ph.preheader, label %vector.ph

 scalar.ph.preheader:                              ; preds = %middle.block, %2
   %.ph = phi i64 [ 0, %2 ], [ %20, %middle.block ]
   %.ph7 = phi ptr [ %1, %2 ], [ %ind.end, %middle.block ]
   br label %scalar.ph

 vector.ph:                                        ; preds = %2
   %6 = udiv i64 %5, 24
   %7 = add nuw nsw i64 %6, 1
   %n.mod.vf = and i64 %7, 3
   %8 = icmp eq i64 %n.mod.vf, 0
   %9 = select i1 %8, i64 4, i64 %n.mod.vf
   %n.vec = sub nsw i64 %7, %9
   %10 = mul i64 %n.vec, 24
   %ind.end = getelementptr i8, ptr %1, i64 %10
-  %invariant.gep = getelementptr i8, ptr %1, i64 8
-  %invariant.gep12 = getelementptr i8, ptr %1, i64 56
+  %invariant.gep = getelementptr i8, ptr %1, i64 16
+  %invariant.gep12 = getelementptr i8, ptr %1, i64 64
   br label %vector.body

 vector.body:                                      ; preds = %vector.body, %vector.ph
   %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
   %vec.phi = phi <2 x i64> [ zeroinitializer, %vector.ph ], [ %17, %vector.body ]
   %vec.phi1 = phi <2 x i64> [ zeroinitializer, %vector.ph ], [ %18, %vector.body ]
   %offset.idx = mul i64 %index, 24
   %gep = getelementptr i8, ptr %invariant.gep, i64 %offset.idx
+  %gep13 = getelementptr i8, ptr %invariant.gep12, i64 %offset.idx
   %wide.vec = load <6 x ptr>, ptr %gep, align 8
   %strided.vec = shufflevector <6 x ptr> %wide.vec, <6 x ptr> poison, <2 x i32> <i32 0, i32 3>
   %strided.vec3 = shufflevector <6 x ptr> %wide.vec, <6 x ptr> poison, <2 x i32> <i32 1, i32 4>
-  %gep13 = getelementptr i8, ptr %invariant.gep12, i64 %offset.idx
   %wide.vec4 = load <6 x ptr>, ptr %gep13, align 8
   %strided.vec5 = shufflevector <6 x ptr> %wide.vec4, <6 x ptr> poison, <2 x i32> <i32 0, i32 3>
   %strided.vec6 = shufflevector <6 x ptr> %wide.vec4, <6 x ptr> poison, <2 x i32> <i32 1, i32 4>
   %11 = ptrtoint <2 x ptr> %strided.vec3 to <2 x i64>
   %12 = ptrtoint <2 x ptr> %strided.vec6 to <2 x i64>
   %13 = ptrtoint <2 x ptr> %strided.vec to <2 x i64>
   %14 = ptrtoint <2 x ptr> %strided.vec5 to <2 x i64>
   %15 = or <2 x i64> %vec.phi, %11
   %16 = or <2 x i64> %vec.phi1, %12
   %17 = or <2 x i64> %15, %13
   %18 = or <2 x i64> %16, %14
   %index.next = add nuw i64 %index, 4
   %19 = icmp eq i64 %index.next, %n.vec
   br i1 %19, label %middle.block, label %vector.body, !llvm.loop !0

 middle.block:                                     ; preds = %vector.body
   %bin.rdx = or <2 x i64> %18, %17
   %20 = tail call i64 @llvm.vector.reduce.or.v2i64(<2 x i64> %bin.rdx)
   br label %scalar.ph.preheader

 21:                                               ; preds = %scalar.ph
   ret i64 %31

 scalar.ph:                                        ; preds = %scalar.ph.preheader, %scalar.ph
   %22 = phi i64 [ %31, %scalar.ph ], [ %.ph, %scalar.ph.preheader ]
   %23 = phi ptr [ %32, %scalar.ph ], [ %.ph7, %scalar.ph.preheader ]
   %24 = getelementptr i8, ptr %23, i64 16
   %25 = load ptr, ptr %24, align 8
   %26 = ptrtoint ptr %25 to i64
   %27 = getelementptr i8, ptr %23, i64 8
   %28 = load ptr, ptr %27, align 8
   %29 = ptrtoint ptr %28 to i64
   %30 = or i64 %22, %26
   %31 = or i64 %30, %29
   %32 = getelementptr nusw i8, ptr %23, i64 24
   %33 = icmp eq ptr %23, %0
   br i1 %33, label %21, label %scalar.ph, !llvm.loop !3
 }

 ; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
 declare i64 @llvm.vector.reduce.or.v2i64(<2 x i64>) #1

 attributes #0 = { nofree norecurse nosync nounwind memory(argmem: read) }
 attributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }

 !0 = distinct !{!0, !1, !2}
 !1 = !{!"llvm.loop.isvectorized", i32 1}
 !2 = !{!"llvm.loop.unroll.runtime.disable"}
 !3 = distinct !{!3, !2, !1}

Hopefully, no important details were lost during reduction.

Test case from #106431.

Use getAllocTypeSize to get compute the offset to the start of interleave groups instead getScalarSizeInBits, which may return 0 for pointers. This is in line with the analysis building the interleave groups and fixes a mis-compile reported for #106431.

fhahn · 2024-10-16T06:23:18Z

@AlexHF thanks! Should be fixed in bbff5b8. The issue was that the patch was using getScalarSizeInBits, which returned 0 for pointers. Fixed by using getAllocTypeSize (in line with what the interleave analysis uses)

alexfh · 2024-10-16T14:09:38Z

Thanks! bbff5b8 resolved the issues we found so far.

This patch adds a new narrowInterleaveGroups transfrom, which tries convert a plan with interleave groups with VF elements to a plan that instead replaces the interleave groups with wide loads and stores processing VF elements. This effectively is a very simple form of loop-aware SLP, where we use interleave groups to identify candidates. This initial version is quite restricted and hopefully serves as a starting point for how to best model those kinds of transforms. For now it only transforms load interleave groups feeding store groups. Depends on #106431. This lands the main parts of the approved #106441 as suggested to break things up a bit more.

vvereschaka · 2025-03-20T20:05:12Z

@fhahn ,
https://lab.llvm.org/buildbot/#/builders/160/builds/15076

CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /usr/bin/c++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/buildbot/worker/as-builder-7/ramdisk/llvm-nvptx64-nvidia-ubuntu/build/lib/Transforms/Vectorize -I/home/buildbot/worker/as-builder-7/ramdisk/llvm-nvptx64-nvidia-ubuntu/llvm-project/llvm/lib/Transforms/Vectorize -I/home/buildbot/worker/as-builder-7/ramdisk/llvm-nvptx64-nvidia-ubuntu/build/include -I/home/buildbot/worker/as-builder-7/ramdisk/llvm-nvptx64-nvidia-ubuntu/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -fPIC  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -std=c++17 -MD -MT lib/Transforms/Vectorize/CMakeFiles/LLVMVectorize.dir/VPlanTransforms.cpp.o -MF lib/Transforms/Vectorize/CMakeFiles/LLVMVectorize.dir/VPlanTransforms.cpp.o.d -o lib/Transforms/Vectorize/CMakeFiles/LLVMVectorize.dir/VPlanTransforms.cpp.o -c /home/buildbot/worker/as-builder-7/ramdisk/llvm-nvptx64-nvidia-ubuntu/llvm-project/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
/home/buildbot/worker/as-builder-7/ramdisk/llvm-nvptx64-nvidia-ubuntu/llvm-project/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp: In function ‘bool isConsecutiveInterleaveGroup(llvm::VPInterleaveRecipe*, unsigned int, llvm::VPTypeAnalysis&, unsigned int)’:
/home/buildbot/worker/as-builder-7/ramdisk/llvm-nvptx64-nvidia-ubuntu/llvm-project/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:2263:19: error: expected ‘;’ before ‘}’ token
 2263 |       return false
      |                   ^
      |                   ;
 2264 |   }
      |   ~

would you take care of it?

fhahn · 2025-03-20T20:06:53Z

@fhahn , https://lab.llvm.org/buildbot/#/builders/160/builds/15076

CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /usr/bin/c++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/buildbot/worker/as-builder-7/ramdisk/llvm-nvptx64-nvidia-ubuntu/build/lib/Transforms/Vectorize -I/home/buildbot/worker/as-builder-7/ramdisk/llvm-nvptx64-nvidia-ubuntu/llvm-project/llvm/lib/Transforms/Vectorize -I/home/buildbot/worker/as-builder-7/ramdisk/llvm-nvptx64-nvidia-ubuntu/build/include -I/home/buildbot/worker/as-builder-7/ramdisk/llvm-nvptx64-nvidia-ubuntu/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -fPIC  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -std=c++17 -MD -MT lib/Transforms/Vectorize/CMakeFiles/LLVMVectorize.dir/VPlanTransforms.cpp.o -MF lib/Transforms/Vectorize/CMakeFiles/LLVMVectorize.dir/VPlanTransforms.cpp.o.d -o lib/Transforms/Vectorize/CMakeFiles/LLVMVectorize.dir/VPlanTransforms.cpp.o -c /home/buildbot/worker/as-builder-7/ramdisk/llvm-nvptx64-nvidia-ubuntu/llvm-project/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
/home/buildbot/worker/as-builder-7/ramdisk/llvm-nvptx64-nvidia-ubuntu/llvm-project/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp: In function ‘bool isConsecutiveInterleaveGroup(llvm::VPInterleaveRecipe*, unsigned int, llvm::VPTypeAnalysis&, unsigned int)’:
/home/buildbot/worker/as-builder-7/ramdisk/llvm-nvptx64-nvidia-ubuntu/llvm-project/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:2263:19: error: expected ‘;’ before ‘}’ token
 2263 |       return false
      |                   ^
      |                   ;
 2264 |   }
      |   ~

would you take care of it?

Should already be fixed

This patch adds a new narrowInterleaveGroups transfrom, which tries convert a plan with interleave groups with VF elements to a plan that instead replaces the interleave groups with wide loads and stores processing VF elements. This effectively is a very simple form of loop-aware SLP, where we use interleave groups to identify candidates. This initial version is quite restricted and hopefully serves as a starting point for how to best model those kinds of transforms. Depends on #106431. Fixes #82936. PR: #106441

…106441) This patch adds a new narrowInterleaveGroups transfrom, which tries convert a plan with interleave groups with VF elements to a plan that instead replaces the interleave groups with wide loads and stores processing VF elements. This effectively is a very simple form of loop-aware SLP, where we use interleave groups to identify candidates. This initial version is quite restricted and hopefully serves as a starting point for how to best model those kinds of transforms. Depends on llvm/llvm-project#106431. Fixes llvm/llvm-project#82936. PR: llvm/llvm-project#106441

…6441)" This reverts commit ff3e2ba. The recommmitted version limits to transform to cases where no interleaving is taking place, to avoid a mis-compile when interleaving. Original commit message: This patch adds a new narrowInterleaveGroups transfrom, which tries convert a plan with interleave groups with VF elements to a plan that instead replaces the interleave groups with wide loads and stores processing VF elements. This effectively is a very simple form of loop-aware SLP, where we use interleave groups to identify candidates. This initial version is quite restricted and hopefully serves as a starting point for how to best model those kinds of transforms. Depends on #106431. Fixes #82936. PR: #106441

…roups. (#106441)" This reverts commit ff3e2ba. The recommmitted version limits to transform to cases where no interleaving is taking place, to avoid a mis-compile when interleaving. Original commit message: This patch adds a new narrowInterleaveGroups transfrom, which tries convert a plan with interleave groups with VF elements to a plan that instead replaces the interleave groups with wide loads and stores processing VF elements. This effectively is a very simple form of loop-aware SLP, where we use interleave groups to identify candidates. This initial version is quite restricted and hopefully serves as a starting point for how to best model those kinds of transforms. Depends on llvm/llvm-project#106431. Fixes llvm/llvm-project#82936. PR: llvm/llvm-project#106441

fhahn requested review from rengolin, annamthomas, ayalz and aniragil August 28, 2024 18:37

llvmbot added vectorizers llvm:transforms labels Aug 28, 2024

fhahn mentioned this pull request Aug 28, 2024

[VPlan] Add transformation to narrow interleave groups. #106441

Merged

artagnon reviewed Sep 13, 2024

View reviewed changes

ayalz reviewed Sep 23, 2024

View reviewed changes

fhahn added 6 commits September 24, 2024 08:28

Merge remote-tracking branch 'origin/main' into vplan-interleave-star…

f23772b

…t-address

!fixup address latest comments, thanks!

5ad888a

Merge remote-tracking branch 'origin/main' into vplan-interleave-star…

d2fd005

…t-address

!fixup remove dead variable

3c896cb

!fixup update tests.

0f2bdc0

Merge remote-tracking branch 'origin/main' into vplan-interleave-star…

e7c09a9

…t-address-tmp

fhahn added a commit that referenced this pull request Sep 24, 2024

[VPlan] Add createPtrAdd helper (NFC).

c95583f

Preparation for #106431.

fhahn added 2 commits September 24, 2024 19:56

Merge remote-tracking branch 'origin/main' into vplan-interleave-star…

3f64c75

…t-address-tmp

!fixup Address latest comments, thanks!

0b10868

ayalz approved these changes Oct 5, 2024

View reviewed changes

fhahn added 3 commits October 6, 2024 21:50

Merge remote-tracking branch 'origin/main' into vplan-interleave-star…

45a5ea1

…t-address

!fixup address comments, thanks!

8a3cd0e

!fixup update missed test

a361393

fhahn merged commit 7f74651 into llvm:main Oct 6, 2024
6 of 8 checks passed

fhahn deleted the vplan-interleave-start-address branch October 6, 2024 21:53

fhahn added a commit that referenced this pull request Oct 8, 2024

[VPlan] Don't created GEP x, 0 for interleave group pointers.

3ec6f80

The GEP with offet 0 is redundant, remove it. This addresses a TODO from 7f74651 ((#106431).

fhahn added a commit that referenced this pull request Oct 16, 2024

[LV] Add test where interleave group start pointer is incorrect.

cc5b5ca

Test case from #106431.

rj-jesus mentioned this pull request Jul 15, 2025

[VPlan] Consider address computation cost in VPInterleaveRecipe. #148808

Open

		isInBounds() ? GEPNoWrapFlags::inBounds()
		: GEPNoWrapFlags::none());

		; CHECK-NEXT: [[TMP5:%.*]] = getelementptr i8, ptr [[TMP4]], i32 -4
		; CHECK-NEXT: [[TMP51:%.*]] = getelementptr i16, ptr [[TMP5]], i32 0

	VPInstruction(VPValue Ptr, VPValue Offset, GEPFlagsTy Flags = {false},
	VPInstruction(VPValue Ptr, VPValue Offset, GEPFlagsTy Flags,

	// We cannot re-use the address of the first member because it does not
	// We cannot re-use the address of member zero because it does not

	// dominate the insert position. Use the address of the insert position
	// dominate the insert position. Instead, use the address of the insert position

	// and create a PtrAdd to adjust the index to start at the first member.
	// and create a PtrAdd adjusting it to the address of member zero.

[VPlan] Use pointer to member 0 as VPInterleaveRecipe's pointer arg. #106431

[VPlan] Use pointer to member 0 as VPInterleaveRecipe's pointer arg. #106431

Uh oh!

Conversation

fhahn commented Aug 28, 2024

Uh oh!

llvmbot commented Aug 28, 2024

Uh oh!

fhahn commented Sep 12, 2024

Uh oh!

artagnon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ayalz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ayalz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexfh commented Oct 15, 2024 •

edited

Loading