Skip to content

[VPlan] Handle FirstActiveLane when unrolling. #145394

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 27, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion llvm/lib/Transforms/Vectorize/VPlan.h
Original file line number Diff line number Diff line change
Expand Up @@ -964,7 +964,9 @@ class VPInstruction : public VPRecipeWithIRFlags,
// all unrolled iterations. Unrolling will add all copies of its original
// operand as additional operands.
AnyOf,
// Calculates the first active lane index of the vector predicate operand.
// Calculates the first active lane index of the vector predicate operands.
// It produces the lane index across all unrolled iterations. Unrolling will
// add all copies of its original operand as additional operands.
FirstActiveLane,

// The opcodes below are used for VPInstructionWithType.
Expand Down
29 changes: 26 additions & 3 deletions llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -856,9 +856,32 @@ Value *VPInstruction::generate(VPTransformState &State) {
return Builder.CreateOrReduce(Res);
}
case VPInstruction::FirstActiveLane: {
Value *Mask = State.get(getOperand(0));
return Builder.CreateCountTrailingZeroElems(Builder.getInt64Ty(), Mask,
true, Name);
if (getNumOperands() == 1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (getNumOperands() == 1) {
auto Int64 = Builder.getInt64Ty();
if (getNumOperands() == 1) {

to be reused in multiple cases below.

Value *Mask = State.get(getOperand(0));
return Builder.CreateCountTrailingZeroElems(Builder.getInt64Ty(), Mask,
true, Name);
Comment on lines +861 to +862
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return Builder.CreateCountTrailingZeroElems(Builder.getInt64Ty(), Mask,
true, Name);
Value *LastActiveLaneInMask = Builder.CreateCountTrailingZeroElems(
Int64, Mask, /* ZeroIsPoison */ true, Name);
return LastActiveLaneInMask;

consistent with below.

}
// If there are multiple operands, create a chain of selects to pick the
// first operand with an active lane and add the number of lanes of the
// preceding operands.
Comment on lines +864 to +866
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If there are multiple operands, create a chain of selects to pick the
// first operand with an active lane and add the number of lanes of the
// preceding operands.
// If there are multiple operands, create a chain of selects to pick the
// first operand with an active lane and add the number of lanes of the
// preceding operands. Iterate over the operands backwards overwriting the
// result whenever an active lane is found.

Value *RuntimeVF =
getRuntimeVF(State.Builder, State.Builder.getInt64Ty(), State.VF);
Comment on lines +867 to +868
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Value *RuntimeVF =
getRuntimeVF(State.Builder, State.Builder.getInt64Ty(), State.VF);
Value *RuntimeVF = getRuntimeVF(Builder, Int64, State.VF);

Use Builder here, as above and below, rather than State.Builder (the former is set to the latter).

unsigned LastOpIdx = getNumOperands() - 1;
Value *Res = nullptr;
for (int Idx = LastOpIdx; Idx >= 0; --Idx) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can something like this

Suggested change
for (int Idx = LastOpIdx; Idx >= 0; --Idx) {
for (VPValue *Operand : reverse(operands()) {

work instead?

Value *TrailingZeros = Builder.CreateCountTrailingZeroElems(
Builder.getInt64Ty(), State.get(getOperand(Idx)), true, Name);
Comment on lines +872 to +873
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Value *TrailingZeros = Builder.CreateCountTrailingZeroElems(
Builder.getInt64Ty(), State.get(getOperand(Idx)), true, Name);
Value *Mask = State.get(getOperand(Idx));
Value *LastActiveLaneInMask = Builder.CreateCountTrailingZeroElems(
Int64, Mask, /* ZeroIsPoison */ true, Name);

Value *Current = Builder.CreateAdd(
Builder.CreateMul(RuntimeVF, Builder.getInt64(Idx)), TrailingZeros);
Comment on lines +874 to +875
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Value *Current = Builder.CreateAdd(
Builder.CreateMul(RuntimeVF, Builder.getInt64(Idx)), TrailingZeros);
Value *NumPreceedingLanes = Builder.CreateMul(RuntimeVF, Builder.getInt64(Idx));
Value *LastActiveLane = Builder.CreateAdd(NumPreceedingLanes, LastActiveLaneInMask);

splitting is easier to read, as in cmp-select below?

if (Res) {
Value *Cmp = Builder.CreateICmpNE(TrailingZeros, RuntimeVF);
Res = Builder.CreateSelect(Cmp, Current, Res);
} else {
Res = Current;
}
Comment on lines +876 to +881
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (Res) {
Value *Cmp = Builder.CreateICmpNE(TrailingZeros, RuntimeVF);
Res = Builder.CreateSelect(Cmp, Current, Res);
} else {
Res = Current;
}
if (!Result) {
Result = LastActiveLane;
} else {
Value *AnyActiveLaneInMask = Builder.CreateICmpNE(LastActiveLaneInMask, RuntimeVF);
Result = Builder.CreateSelect(AnyActiveLaneInMask, LastActiveLane, Result);
}

}

return Res;
}
default:
llvm_unreachable("Unsupported opcode for instruction");
Expand Down
6 changes: 4 additions & 2 deletions llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -345,10 +345,12 @@ void UnrollState::unrollBlock(VPBlockBase *VPB) {
if (ToSkip.contains(&R) || isa<VPIRInstruction>(&R))
continue;

// Add all VPValues for all parts to AnyOf and Compute*Result which combine
// all parts to compute the final value.
// Add all VPValues for all parts to AnyOf, FirstActiveLaneMask and
// Compute*Result which combine all parts to compute the final value.
VPValue *Op1;
if (match(&R, m_VPInstruction<VPInstruction::AnyOf>(m_VPValue(Op1))) ||
match(&R, m_VPInstruction<VPInstruction::FirstActiveLane>(
m_VPValue(Op1))) ||
match(&R, m_VPInstruction<VPInstruction::ComputeAnyOfResult>(
m_VPValue(), m_VPValue(), m_VPValue(Op1))) ||
match(&R, m_VPInstruction<VPInstruction::ComputeReductionResult>(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -77,8 +77,27 @@ define i64 @same_exit_block_pre_inc_use1() #0 {
; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 510, [[N_VEC]]
; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_END:%.*]], label [[SCALAR_PH]]
; CHECK: vector.early.exit:
; CHECK-NEXT: [[TMP63:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP42:%.*]] = mul nuw i64 [[TMP63]], 16
; CHECK-NEXT: [[TMP44:%.*]] = call i64 @llvm.experimental.cttz.elts.i64.nxv16i1(<vscale x 16 x i1> [[TMP32]], i1 true)
; CHECK-NEXT: [[TMP62:%.*]] = mul i64 [[TMP42]], 3
; CHECK-NEXT: [[TMP45:%.*]] = add i64 [[TMP62]], [[TMP44]]
; CHECK-NEXT: [[TMP46:%.*]] = call i64 @llvm.experimental.cttz.elts.i64.nxv16i1(<vscale x 16 x i1> [[TMP31]], i1 true)
; CHECK-NEXT: [[TMP58:%.*]] = mul i64 [[TMP42]], 2
; CHECK-NEXT: [[TMP50:%.*]] = add i64 [[TMP58]], [[TMP46]]
; CHECK-NEXT: [[TMP47:%.*]] = icmp ne i64 [[TMP46]], [[TMP42]]
; CHECK-NEXT: [[TMP51:%.*]] = select i1 [[TMP47]], i64 [[TMP50]], i64 [[TMP45]]
; CHECK-NEXT: [[TMP52:%.*]] = call i64 @llvm.experimental.cttz.elts.i64.nxv16i1(<vscale x 16 x i1> [[TMP30]], i1 true)
; CHECK-NEXT: [[TMP64:%.*]] = mul i64 [[TMP42]], 1
; CHECK-NEXT: [[TMP56:%.*]] = add i64 [[TMP64]], [[TMP52]]
; CHECK-NEXT: [[TMP53:%.*]] = icmp ne i64 [[TMP52]], [[TMP42]]
; CHECK-NEXT: [[TMP57:%.*]] = select i1 [[TMP53]], i64 [[TMP56]], i64 [[TMP51]]
; CHECK-NEXT: [[TMP15:%.*]] = call i64 @llvm.experimental.cttz.elts.i64.nxv16i1(<vscale x 16 x i1> [[TMP11]], i1 true)
; CHECK-NEXT: [[TMP16:%.*]] = add i64 [[INDEX1]], [[TMP15]]
; CHECK-NEXT: [[TMP65:%.*]] = mul i64 [[TMP42]], 0
; CHECK-NEXT: [[TMP60:%.*]] = add i64 [[TMP65]], [[TMP15]]
; CHECK-NEXT: [[TMP59:%.*]] = icmp ne i64 [[TMP15]], [[TMP42]]
; CHECK-NEXT: [[TMP61:%.*]] = select i1 [[TMP59]], i64 [[TMP60]], i64 [[TMP57]]
; CHECK-NEXT: [[TMP16:%.*]] = add i64 [[INDEX1]], [[TMP61]]
; CHECK-NEXT: [[TMP17:%.*]] = add i64 3, [[TMP16]]
; CHECK-NEXT: br label [[LOOP_END]]
; CHECK: scalar.ph:
Expand Down
Loading