Skip to content

[InstCombine] Handle trunc i1 pattern in eq-of-parts fold #112704

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 25, 2024

Conversation

nikic
Copy link
Contributor

@nikic nikic commented Oct 17, 2024

Equality/inequality of the low bit can be represented by (trunc (xor x, y) to i1), possibly with an extra not. We have to handle this in the eq-of-parts fold now that we no longer canonicalize this to a masked icmp.

Proofs: https://alive2.llvm.org/ce/z/qidkzq

Fixes #110919.

@llvmbot
Copy link
Member

llvmbot commented Oct 17, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Nikita Popov (nikic)

Changes

Equality/inequality of the low bit can be represented by (trunc (xor x, y) to i1), possibly with an extra not. We have to handle this in the eq-of-parts fold now that we no longer canonicalize this to a masked icmp.

Proofs: https://alive2.llvm.org/ce/z/qidkzq

Fixes #110919.


Full diff: https://github.com/llvm/llvm-project/pull/112704.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp (+17-6)
  • (modified) llvm/lib/Transforms/InstCombine/InstCombineInternal.h (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/eq-of-parts.ll (+2-9)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
index 8112255a0b6c45..629ac859c06e35 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
@@ -1185,14 +1185,25 @@ static Value *extractIntPart(const IntPart &P, IRBuilderBase &Builder) {
 /// (icmp eq X0, Y0) & (icmp eq X1, Y1) -> icmp eq X01, Y01
 /// (icmp ne X0, Y0) | (icmp ne X1, Y1) -> icmp ne X01, Y01
 /// where X0, X1 and Y0, Y1 are adjacent parts extracted from an integer.
-Value *InstCombinerImpl::foldEqOfParts(ICmpInst *Cmp0, ICmpInst *Cmp1,
-                                       bool IsAnd) {
+Value *InstCombinerImpl::foldEqOfParts(Value *Cmp0, Value *Cmp1, bool IsAnd) {
   if (!Cmp0->hasOneUse() || !Cmp1->hasOneUse())
     return nullptr;
 
   CmpInst::Predicate Pred = IsAnd ? CmpInst::ICMP_EQ : CmpInst::ICMP_NE;
-  auto GetMatchPart = [&](ICmpInst *Cmp,
+  auto GetMatchPart = [&](Value *CmpV,
                           unsigned OpNo) -> std::optional<IntPart> {
+    Value *X, *Y;
+    // icmp ne (and x, 1), (and y, 1) <=> trunc (xor x, y) to i1
+    // icmp eq (and x, 1), (and y, 1) <=> not (trunc (xor x, y) to i1)
+    if (Pred == CmpInst::ICMP_NE
+            ? match(CmpV, m_Trunc(m_Xor(m_Value(X), m_Value(Y))))
+            : match(CmpV, m_Not(m_Trunc(m_Xor(m_Value(X), m_Value(Y))))))
+      return {{OpNo == 0 ? X : Y, 0, 1}};
+
+    auto *Cmp = dyn_cast<ICmpInst>(CmpV);
+    if (!Cmp)
+      return std::nullopt;
+
     if (Pred == Cmp->getPredicate())
       return matchIntPart(Cmp->getOperand(OpNo));
 
@@ -3413,9 +3424,6 @@ Value *InstCombinerImpl::foldAndOrOfICmps(ICmpInst *LHS, ICmpInst *RHS,
       return X;
   }
 
-  if (Value *X = foldEqOfParts(LHS, RHS, IsAnd))
-    return X;
-
   // (icmp ne A, 0) | (icmp ne B, 0) --> (icmp ne (A|B), 0)
   // (icmp eq A, 0) & (icmp eq B, 0) --> (icmp eq (A|B), 0)
   // TODO: Remove this and below when foldLogOpOfMaskedICmps can handle undefs.
@@ -3538,6 +3546,9 @@ Value *InstCombinerImpl::foldBooleanAndOr(Value *LHS, Value *RHS,
       if (Value *Res = foldLogicOfFCmps(LHSCmp, RHSCmp, IsAnd, IsLogical))
         return Res;
 
+  if (Value *Res = foldEqOfParts(LHS, RHS, IsAnd))
+    return Res;
+
   return nullptr;
 }
 
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineInternal.h b/llvm/lib/Transforms/InstCombine/InstCombineInternal.h
index 7a060cdab2d37d..6c38725939f2a6 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineInternal.h
+++ b/llvm/lib/Transforms/InstCombine/InstCombineInternal.h
@@ -411,7 +411,7 @@ class LLVM_LIBRARY_VISIBILITY InstCombinerImpl final
                           bool IsAnd, bool IsLogical = false);
   Value *foldXorOfICmps(ICmpInst *LHS, ICmpInst *RHS, BinaryOperator &Xor);
 
-  Value *foldEqOfParts(ICmpInst *Cmp0, ICmpInst *Cmp1, bool IsAnd);
+  Value *foldEqOfParts(Value *Cmp0, Value *Cmp1, bool IsAnd);
 
   Value *foldAndOrOfICmpsUsingRanges(ICmpInst *ICmp1, ICmpInst *ICmp2,
                                      bool IsAnd);
diff --git a/llvm/test/Transforms/InstCombine/eq-of-parts.ll b/llvm/test/Transforms/InstCombine/eq-of-parts.ll
index afe5d6af1fcd49..5bcfef812c6bb6 100644
--- a/llvm/test/Transforms/InstCombine/eq-of-parts.ll
+++ b/llvm/test/Transforms/InstCombine/eq-of-parts.ll
@@ -1441,11 +1441,7 @@ define i1 @ne_optimized_highbits_cmp_todo_overlapping(i32 %x, i32 %y) {
 
 define i1 @and_trunc_i1(i8 %a1, i8 %a2) {
 ; CHECK-LABEL: @and_trunc_i1(
-; CHECK-NEXT:    [[XOR:%.*]] = xor i8 [[A1:%.*]], [[A2:%.*]]
-; CHECK-NEXT:    [[CMP:%.*]] = icmp ult i8 [[XOR]], 2
-; CHECK-NEXT:    [[LOBIT:%.*]] = trunc i8 [[XOR]] to i1
-; CHECK-NEXT:    [[LOBIT_INV:%.*]] = xor i1 [[LOBIT]], true
-; CHECK-NEXT:    [[AND:%.*]] = and i1 [[CMP]], [[LOBIT_INV]]
+; CHECK-NEXT:    [[AND:%.*]] = icmp eq i8 [[A1:%.*]], [[A2:%.*]]
 ; CHECK-NEXT:    ret i1 [[AND]]
 ;
   %xor = xor i8 %a1, %a2
@@ -1494,10 +1490,7 @@ define i1 @and_trunc_i1_wrong_operands(i8 %a1, i8 %a2, i8 %a3) {
 
 define i1 @or_trunc_i1(i64 %a1, i64 %a2) {
 ; CHECK-LABEL: @or_trunc_i1(
-; CHECK-NEXT:    [[XOR:%.*]] = xor i64 [[A2:%.*]], [[A1:%.*]]
-; CHECK-NEXT:    [[CMP:%.*]] = icmp ugt i64 [[XOR]], 1
-; CHECK-NEXT:    [[TRUNC:%.*]] = trunc i64 [[XOR]] to i1
-; CHECK-NEXT:    [[OR:%.*]] = or i1 [[CMP]], [[TRUNC]]
+; CHECK-NEXT:    [[OR:%.*]] = icmp ne i64 [[A2:%.*]], [[A1:%.*]]
 ; CHECK-NEXT:    ret i1 [[OR]]
 ;
   %xor = xor i64 %a2, %a1

@goldsteinn
Copy link
Contributor

goldsteinn commented Oct 17, 2024

LGTM (edit: assuming) no regressions on Yingwei's benchmarks.

Also note, some further missing patterns: https://godbolt.org/z/b79Gnrxnx

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Oct 17, 2024
nikic added a commit to nikic/llvm-project that referenced this pull request Nov 13, 2024
We currently support simple reassociation for foldAndOrOfICmps().
Support the same for foldLogicOfFCmps() by going through the
common foldBooleanAndOr() helper.

This will also resolve the regression on llvm#112704, which is also
due to missing reassoc support.

I had to adjust one fold to add support for FMF flag preservation,
otherwise there would be test regressions. There is a separate
fold (reassociateFCmps) handling reassociation for *just* that
specific case and it preserves FMF. Unfortuantely it's not
rendered entirely redundant by this patch, because it handles
one more level of reassociation as well.
nikic added a commit to nikic/llvm-project that referenced this pull request Nov 25, 2024
We currently support simple reassociation for foldAndOrOfICmps().
Support the same for foldLogicOfFCmps() by going through the
common foldBooleanAndOr() helper.

This will also resolve the regression on llvm#112704, which is also
due to missing reassoc support.

I had to adjust one fold to add support for FMF flag preservation,
otherwise there would be test regressions. There is a separate
fold (reassociateFCmps) handling reassociation for *just* that
specific case and it preserves FMF. Unfortuantely it's not
rendered entirely redundant by this patch, because it handles
one more level of reassociation as well.
nikic added a commit that referenced this pull request Nov 25, 2024
We currently support simple reassociation for foldAndOrOfICmps().
Support the same for foldLogicOfFCmps() by going through the common
foldBooleanAndOr() helper.

This will also resolve the regression on #112704, which is also due to
missing reassoc support.

I had to adjust one fold to add support for FMF flag preservation,
otherwise there would be test regressions. There is a separate fold
(reassociateFCmps) handling reassociation for *just* that specific case
and it preserves FMF. Unfortunately it's not rendered entirely redundant
by this patch, because it handles one more level of reassociation as
well.
nikic added a commit that referenced this pull request Nov 25, 2024
Equality/inequality of the low bit can be represented by
`(trunc (xor x, y) to i1)`, possibly with an extra not. We have
to handle this in the eq-of-parts fold now that we no longer
canonicalize this to a masked icmp.

Proofs: https://alive2.llvm.org/ce/z/qidkzq

Fixes llvm#110919.
@nikic nikic force-pushed the instcombine-eq-parts-trunc branch from a9402ae to dd6eff5 Compare November 25, 2024 09:31
@nikic nikic merged commit e477989 into llvm:main Nov 25, 2024
8 checks passed
@nikic nikic deleted the instcombine-eq-parts-trunc branch November 25, 2024 10:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[InstCombine] Regression in and of masked comparison fold
5 participants