Skip to content

Conversation

XChy
Copy link
Member

@XChy XChy commented Sep 22, 2023

This patch extends foldCmpLoadFromIndexedGlobal and switch to byte-driven method to fold IR below:

define i1 @cmp_load_constant_array0(i64 %x){
entry:
  %cond = icmp ult i64 %x, 2
  br i1 %cond, label %case1, label %case2

case2:
  ret i1 0

case1:
  %isOK_ptr = getelementptr inbounds i32, ptr @CG, i64 %x
  %isOK = load i32, ptr %isOK_ptr
  %cond_inferred = icmp ult i32 %isOK, 3
  ret i1 %cond_inferred
}

Proof:
alive2

Related issue:
#64238

Migrated from Phabricator

@nikic nikic requested a review from dtcxzyw December 22, 2023 12:02
dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Dec 22, 2023
Copy link

github-actions bot commented Dec 22, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

if (BeginOffset.slt(0))
BeginOffset += OffsetStep;

uint64_t ElementCountToTraverse = (DataSize - BeginOffset).udiv(OffsetStep).getZExtValue() + 1;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed "+1" here, actually BeginOffset indeed includes one more element. Fix it now.

@XChy XChy force-pushed the IndexedGlobal branch 2 times, most recently from d78ded4 to 353c5f3 Compare December 23, 2023 18:48
// If the index is larger than the pointer offset size of the target,
// truncate the index down like the GEP would do implicitly. We don't have
// to do this for an inbounds GEP because the index can't be out of range.
if (!GEP->isInBounds() && IdxBitWidth > IndexSize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we canonicalize the index's type of GEPs, I think we can skip the transform when IdxBitWidth != IndexSize.

Value *Idx = ConstantInt::get(
PtrIdxTy, (ConstantOffset - BeginOffset).sdiv(OffsetStep));
uint64_t IdxBitWidth = Idx->getType()->getScalarSizeInBits();
for (auto [Var, Coefficient] : VariableOffsets) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size of VariableOffset is 1.


if (Ty) {
Idx = MaskIdx(Idx);
Idx = LazyGetIndex(Idx);
Value *V = Builder.CreateIntCast(Idx, Ty, false);
V = Builder.CreateLShr(ConstantInt::get(Ty, MagicBitvector), V);
V = Builder.CreateAnd(ConstantInt::get(Ty, 1), V);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid creating multiple instructions when the load has multiple users.
See also https://github.com/dtcxzyw/llvm-opt-benchmark/pull/28/files/f845e103a2e2a78409e1f2aed9e21733056fd134#r1435245448.
But it would be good to do it in a separate PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will post a separate PR for it later.

@XChy XChy force-pushed the IndexedGlobal branch 3 times, most recently from 956ee50 to 992f410 Compare December 29, 2023 13:04
auto [Var, Coefficient] = VariableOffsets.front();
uint64_t VarBitWidth = Var->getType()->getScalarSizeInBits();
assert("GEP indices do not get canonicalized to the index type" &&
VarBitWidth == IdxBitWidth);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should check this condition at the start of the transform and bail out. While it is canonicalized, there's no guarantee that the GEP is canonicalized at this point yet.

// idx < 3, we actually get x + 3 < 3
Value *Bias = ConstantInt::get(
PtrIdxTy, (ConstantOffset - BeginOffset).sdiv(OffsetStep));
uint64_t IdxBitWidth = PtrIdxTy->getScalarSizeInBits();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as the IndexSize variable.

;
entry:
%cond = icmp ult i64 %x, 2
br i1 %cond, label %case1, label %case2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to omit this condition from the test, so we can see the direct result of the transform, without additional implication reasoning. Same for the next test.

%isOK = load i32, ptr %isOK_ptr
%cond_inferred = icmp ult i32 %isOK, %y
ret i1 %cond_inferred
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see some tests where we also apply an additional offset. In particular also if the offset is greater than the stride, and if the offset is negative. (Preferably for a "non-messy" case, to make it understandable.)

nikic added a commit to nikic/llvm-project that referenced this pull request Sep 8, 2025
nikic added a commit that referenced this pull request Sep 8, 2025
#157089)

foldCmpLoadFromIndexedGlobal() currently checks that the global type,
the GEP type and the load type match in certain ways. Replace this with
generic logic based on offsets.

This is a reboot of #67093.
This PR is less ambitious by requiring that the constant offset is
smaller than the stride, which avoids the additional complexity of that
PR.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Sep 8, 2025
… independent (#157089)

foldCmpLoadFromIndexedGlobal() currently checks that the global type,
the GEP type and the load type match in certain ways. Replace this with
generic logic based on offsets.

This is a reboot of llvm/llvm-project#67093.
This PR is less ambitious by requiring that the constant offset is
smaller than the stride, which avoids the additional complexity of that
PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[InstCombine] Missed optimization for icmp(constants[x]) when the range of x is implied
4 participants