Skip to content

Conversation

Mel-Chen
Copy link
Contributor

@Mel-Chen Mel-Chen commented Aug 7, 2025

The EVL mask is always defined as icmp ult (step-vector, EVL), so we only need to generate it once per plan in the header. Then, we replace all uses of the header mask with the EVL mask, and recursively optimize the users of EVL mask into EVL recipes. This way, the transformation to EVL recipes can be done with just a single loop.

@llvmbot
Copy link
Member

llvmbot commented Aug 7, 2025

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Mel Chen (Mel-Chen)

Changes

The EVL mask is always defined as icmp ult (step-vector, EVL), so we only need to generate it once per plan in the header. Then, we replace all uses of the header mask with the EVL mask, and recursively optimize the users of EVL mask into EVL recipes. This way, the transformation to EVL recipes can be done with just a single loop.


Full diff: https://github.com/llvm/llvm-project/pull/152479.diff

1 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+36-37)
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 0cb704c85ba40..4afaa9c1ece53 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -2227,48 +2227,47 @@ static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
     }
   }
 
-  // Try to optimize header mask recipes away to their EVL variants.
+  // Replace header masks with a mask equivalent to predicating by EVL:
+  //
+  // icmp ule widen-canonical-iv backedge-taken-count
+  // ->
+  // icmp ult step-vector, EVL
+  VPRecipeBase *EVLR = EVL.getDefiningRecipe();
+  VPBuilder Builder(EVLR->getParent(), std::next(EVLR->getIterator()));
+  Type *EVLType = TypeInfo.inferScalarType(&EVL);
+  VPValue *EVLMask = Builder.createICmp(
+      CmpInst::ICMP_ULT,
+      Builder.createNaryOp(VPInstruction::StepVector, {}, EVLType), &EVL);
   for (VPValue *HeaderMask : collectAllHeaderMasks(Plan)) {
-    // TODO: Split optimizeMaskToEVL out and move into
-    // VPlanTransforms::optimize. transformRecipestoEVLRecipes should be run in
-    // tryToBuildVPlanWithVPRecipes beforehand.
-    for (VPUser *U : collectUsersRecursively(HeaderMask)) {
-      auto *CurRecipe = cast<VPRecipeBase>(U);
-      VPRecipeBase *EVLRecipe =
-          optimizeMaskToEVL(HeaderMask, *CurRecipe, TypeInfo, *AllOneMask, EVL);
-      if (!EVLRecipe)
-        continue;
-
-      [[maybe_unused]] unsigned NumDefVal = EVLRecipe->getNumDefinedValues();
-      assert(NumDefVal == CurRecipe->getNumDefinedValues() &&
-             "New recipe must define the same number of values as the "
-             "original.");
-      assert(
-          NumDefVal <= 1 &&
-          "Only supports recipes with a single definition or without users.");
-      EVLRecipe->insertBefore(CurRecipe);
-      if (isa<VPSingleDefRecipe, VPWidenLoadEVLRecipe>(EVLRecipe)) {
-        VPValue *CurVPV = CurRecipe->getVPSingleValue();
-        CurVPV->replaceAllUsesWith(EVLRecipe->getVPSingleValue());
-      }
-      ToErase.push_back(CurRecipe);
-    }
-
-    // Replace header masks with a mask equivalent to predicating by EVL:
-    //
-    // icmp ule widen-canonical-iv backedge-taken-count
-    // ->
-    // icmp ult step-vector, EVL
-    VPRecipeBase *EVLR = EVL.getDefiningRecipe();
-    VPBuilder Builder(EVLR->getParent(), std::next(EVLR->getIterator()));
-    Type *EVLType = TypeInfo.inferScalarType(&EVL);
-    VPValue *EVLMask = Builder.createICmp(
-        CmpInst::ICMP_ULT,
-        Builder.createNaryOp(VPInstruction::StepVector, {}, EVLType), &EVL);
     HeaderMask->replaceAllUsesWith(EVLMask);
     ToErase.push_back(HeaderMask->getDefiningRecipe());
   }
 
+  // Try to optimize header mask recipes away to their EVL variants.
+  // TODO: Split optimizeMaskToEVL out and move into
+  // VPlanTransforms::optimize. transformRecipestoEVLRecipes should be run in
+  // tryToBuildVPlanWithVPRecipes beforehand.
+  for (VPUser *U : collectUsersRecursively(EVLMask)) {
+    auto *CurRecipe = cast<VPRecipeBase>(U);
+    VPRecipeBase *EVLRecipe =
+        optimizeMaskToEVL(EVLMask, *CurRecipe, TypeInfo, *AllOneMask, EVL);
+    if (!EVLRecipe)
+      continue;
+
+    [[maybe_unused]] unsigned NumDefVal = EVLRecipe->getNumDefinedValues();
+    assert(NumDefVal == CurRecipe->getNumDefinedValues() &&
+           "New recipe must define the same number of values as the "
+           "original.");
+    assert(NumDefVal <= 1 &&
+           "Only supports recipes with a single definition or without users.");
+    EVLRecipe->insertBefore(CurRecipe);
+    if (isa<VPSingleDefRecipe, VPWidenLoadEVLRecipe>(EVLRecipe)) {
+      VPValue *CurVPV = CurRecipe->getVPSingleValue();
+      CurVPV->replaceAllUsesWith(EVLRecipe->getVPSingleValue());
+    }
+    ToErase.push_back(CurRecipe);
+  }
+
   for (VPRecipeBase *R : reverse(ToErase)) {
     SmallVector<VPValue *> PossiblyDead(R->operands());
     R->eraseFromParent();

Comment on lines +2235 to +2255
VPRecipeBase *EVLR = EVL.getDefiningRecipe();
VPBuilder Builder(EVLR->getParent(), std::next(EVLR->getIterator()));
Type *EVLType = TypeInfo.inferScalarType(&EVL);
VPValue *EVLMask = Builder.createICmp(
CmpInst::ICMP_ULT,
Builder.createNaryOp(VPInstruction::StepVector, {}, EVLType), &EVL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we check before that there is at least a single user?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this runs after optimization maybe it's possible it might get optimized away? In that case we would end up with an unused EVLMask.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I'm pretty sure we do a second run of removeDeadRecipes afterwards so it's probably ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remove it if the mask is dead.
53f0022

Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. This should also make it easier to split up the parts needed for correctness and which parts are just about optimizing away the header mask.

As an aside, its weird to begin with that there might be multiple header masks? From a quick check it only appears to happen when the data tail folding style is used, and we generate both a VPInstruction::ActiveLaneMask and a icmp ule IV, BTC?. I'll see if we can change this so that we only have one header mask max.

Comment on lines +2235 to +2255
VPRecipeBase *EVLR = EVL.getDefiningRecipe();
VPBuilder Builder(EVLR->getParent(), std::next(EVLR->getIterator()));
Type *EVLType = TypeInfo.inferScalarType(&EVL);
VPValue *EVLMask = Builder.createICmp(
CmpInst::ICMP_ULT,
Builder.createNaryOp(VPInstruction::StepVector, {}, EVLType), &EVL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I'm pretty sure we do a second run of removeDeadRecipes afterwards so it's probably ok.

@lukel97
Copy link
Contributor

lukel97 commented Aug 7, 2025

As an aside, its weird to begin with that there might be multiple header masks? From a quick check it only appears to happen when the data tail folding style is used, and we generate both a VPInstruction::ActiveLaneMask and a icmp ule IV, BTC?. I'll see if we can change this so that we only have one header mask max.

I've opened up #152489 for this

@Mel-Chen
Copy link
Contributor Author

Mel-Chen commented Aug 8, 2025

As an aside, its weird to begin with that there might be multiple header masks? From a quick check it only appears to happen when the data tail folding style is used, and we generate both a VPInstruction::ActiveLaneMask and a icmp ule IV, BTC?. I'll see if we can change this so that we only have one header mask max.

Yes, but I haven’t looked closely into the reason for having more than one header mask yet, since I’m out of office today. But I think there’s a chance we could unify them into a single header mask.

@Mel-Chen Mel-Chen requested a review from alexey-bataev August 8, 2025 12:10
@Mel-Chen Mel-Chen merged commit 6db3776 into llvm:main Aug 11, 2025
9 checks passed
}
// Remove dead EVL mask.
if (EVLMask->getNumUsers() == 0)
EVLMask->getDefiningRecipe()->eraseFromParent();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, should this also be added to ToErase?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mel-Chen Mel-Chen deleted the nfc-evl-mask branch August 15, 2025 07:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants