-
Notifications
You must be signed in to change notification settings - Fork 15k
[LV][EVL] Simplify EVL recipe transformation by using a single EVL mask. nfc #152479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-transforms Author: Mel Chen (Mel-Chen) ChangesThe EVL mask is always defined as Full diff: https://github.com/llvm/llvm-project/pull/152479.diff 1 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 0cb704c85ba40..4afaa9c1ece53 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -2227,48 +2227,47 @@ static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
}
}
- // Try to optimize header mask recipes away to their EVL variants.
+ // Replace header masks with a mask equivalent to predicating by EVL:
+ //
+ // icmp ule widen-canonical-iv backedge-taken-count
+ // ->
+ // icmp ult step-vector, EVL
+ VPRecipeBase *EVLR = EVL.getDefiningRecipe();
+ VPBuilder Builder(EVLR->getParent(), std::next(EVLR->getIterator()));
+ Type *EVLType = TypeInfo.inferScalarType(&EVL);
+ VPValue *EVLMask = Builder.createICmp(
+ CmpInst::ICMP_ULT,
+ Builder.createNaryOp(VPInstruction::StepVector, {}, EVLType), &EVL);
for (VPValue *HeaderMask : collectAllHeaderMasks(Plan)) {
- // TODO: Split optimizeMaskToEVL out and move into
- // VPlanTransforms::optimize. transformRecipestoEVLRecipes should be run in
- // tryToBuildVPlanWithVPRecipes beforehand.
- for (VPUser *U : collectUsersRecursively(HeaderMask)) {
- auto *CurRecipe = cast<VPRecipeBase>(U);
- VPRecipeBase *EVLRecipe =
- optimizeMaskToEVL(HeaderMask, *CurRecipe, TypeInfo, *AllOneMask, EVL);
- if (!EVLRecipe)
- continue;
-
- [[maybe_unused]] unsigned NumDefVal = EVLRecipe->getNumDefinedValues();
- assert(NumDefVal == CurRecipe->getNumDefinedValues() &&
- "New recipe must define the same number of values as the "
- "original.");
- assert(
- NumDefVal <= 1 &&
- "Only supports recipes with a single definition or without users.");
- EVLRecipe->insertBefore(CurRecipe);
- if (isa<VPSingleDefRecipe, VPWidenLoadEVLRecipe>(EVLRecipe)) {
- VPValue *CurVPV = CurRecipe->getVPSingleValue();
- CurVPV->replaceAllUsesWith(EVLRecipe->getVPSingleValue());
- }
- ToErase.push_back(CurRecipe);
- }
-
- // Replace header masks with a mask equivalent to predicating by EVL:
- //
- // icmp ule widen-canonical-iv backedge-taken-count
- // ->
- // icmp ult step-vector, EVL
- VPRecipeBase *EVLR = EVL.getDefiningRecipe();
- VPBuilder Builder(EVLR->getParent(), std::next(EVLR->getIterator()));
- Type *EVLType = TypeInfo.inferScalarType(&EVL);
- VPValue *EVLMask = Builder.createICmp(
- CmpInst::ICMP_ULT,
- Builder.createNaryOp(VPInstruction::StepVector, {}, EVLType), &EVL);
HeaderMask->replaceAllUsesWith(EVLMask);
ToErase.push_back(HeaderMask->getDefiningRecipe());
}
+ // Try to optimize header mask recipes away to their EVL variants.
+ // TODO: Split optimizeMaskToEVL out and move into
+ // VPlanTransforms::optimize. transformRecipestoEVLRecipes should be run in
+ // tryToBuildVPlanWithVPRecipes beforehand.
+ for (VPUser *U : collectUsersRecursively(EVLMask)) {
+ auto *CurRecipe = cast<VPRecipeBase>(U);
+ VPRecipeBase *EVLRecipe =
+ optimizeMaskToEVL(EVLMask, *CurRecipe, TypeInfo, *AllOneMask, EVL);
+ if (!EVLRecipe)
+ continue;
+
+ [[maybe_unused]] unsigned NumDefVal = EVLRecipe->getNumDefinedValues();
+ assert(NumDefVal == CurRecipe->getNumDefinedValues() &&
+ "New recipe must define the same number of values as the "
+ "original.");
+ assert(NumDefVal <= 1 &&
+ "Only supports recipes with a single definition or without users.");
+ EVLRecipe->insertBefore(CurRecipe);
+ if (isa<VPSingleDefRecipe, VPWidenLoadEVLRecipe>(EVLRecipe)) {
+ VPValue *CurVPV = CurRecipe->getVPSingleValue();
+ CurVPV->replaceAllUsesWith(EVLRecipe->getVPSingleValue());
+ }
+ ToErase.push_back(CurRecipe);
+ }
+
for (VPRecipeBase *R : reverse(ToErase)) {
SmallVector<VPValue *> PossiblyDead(R->operands());
R->eraseFromParent();
|
VPRecipeBase *EVLR = EVL.getDefiningRecipe(); | ||
VPBuilder Builder(EVLR->getParent(), std::next(EVLR->getIterator())); | ||
Type *EVLType = TypeInfo.inferScalarType(&EVL); | ||
VPValue *EVLMask = Builder.createICmp( | ||
CmpInst::ICMP_ULT, | ||
Builder.createNaryOp(VPInstruction::StepVector, {}, EVLType), &EVL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we check before that there is at least a single user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this runs after optimization maybe it's possible it might get optimized away? In that case we would end up with an unused EVLMask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I'm pretty sure we do a second run of removeDeadRecipes afterwards so it's probably ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remove it if the mask is dead.
53f0022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. This should also make it easier to split up the parts needed for correctness and which parts are just about optimizing away the header mask.
As an aside, its weird to begin with that there might be multiple header masks? From a quick check it only appears to happen when the data tail folding style is used, and we generate both a VPInstruction::ActiveLaneMask and a icmp ule IV, BTC
?. I'll see if we can change this so that we only have one header mask max.
VPRecipeBase *EVLR = EVL.getDefiningRecipe(); | ||
VPBuilder Builder(EVLR->getParent(), std::next(EVLR->getIterator())); | ||
Type *EVLType = TypeInfo.inferScalarType(&EVL); | ||
VPValue *EVLMask = Builder.createICmp( | ||
CmpInst::ICMP_ULT, | ||
Builder.createNaryOp(VPInstruction::StepVector, {}, EVLType), &EVL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I'm pretty sure we do a second run of removeDeadRecipes afterwards so it's probably ok.
I've opened up #152489 for this |
Yes, but I haven’t looked closely into the reason for having more than one header mask yet, since I’m out of office today. But I think there’s a chance we could unify them into a single header mask. |
} | ||
// Remove dead EVL mask. | ||
if (EVLMask->getNumUsers() == 0) | ||
EVLMask->getDefiningRecipe()->eraseFromParent(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency, should this also be added to ToErase
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The EVL mask is always defined as
icmp ult (step-vector, EVL)
, so we only need to generate it once per plan in the header. Then, we replace all uses of the header mask with the EVL mask, and recursively optimize the users of EVL mask into EVL recipes. This way, the transformation to EVL recipes can be done with just a single loop.