Skip to content

[InstCombine] Factorise Add and Min/Max using Distributivity #101717

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 2, 2024

Conversation

jf-botto
Copy link
Contributor

@jf-botto jf-botto commented Aug 2, 2024

This PR fixes part of #92433.

It specifically adds the 4 cases mentioned in #92433 (comment).

I've added 8 positive tests, 4 of which are mentioned in the comment above and 4 which are their commutative equivalents. Alive proof: https://alive2.llvm.org/ce/z/z6eFTb
I've also added 8 negative tests, because we want to make sure we do not optimise if the relevant flags are not relevant because the optimisation wouldn't be sound. Alive proof that the optimisation is invalid: https://alive2.llvm.org/ce/z/NvNjTD
I did have to make the integer types i4 to make Alive not timeout and to fit them all on one page.

@jf-botto jf-botto requested a review from nikic as a code owner August 2, 2024 17:24
Copy link

github-actions bot commented Aug 2, 2024

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Aug 2, 2024

@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-llvm-transforms

Author: Jorge Botto (jf-botto)

Changes

This PR fixes part of #92433.

It specifically adds the 4 cases mentioned in #92433 (comment).

I've added 8 positive tests, 4 of which are mentioned in the comment above and 4 which are their commutative equivalents. Alive proof: https://alive2.llvm.org/ce/z/z6eFTb
I've also added 8 negative tests, because we want to make sure we do not optimise if the relevant flags are not relevant because the optimisation wouldn't be sound. Alive proof: https://alive2.llvm.org/ce/z/NvNjTD
I did have to make the integer types i4 to make Alive not timeout and to fit them all on one page.


Full diff: https://github.com/llvm/llvm-project/pull/101717.diff

3 Files Affected:

  • (modified) llvm/include/llvm/IR/Operator.h (+3)
  • (modified) llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp (+94)
  • (added) llvm/test/Transforms/InstCombine/intrinsic-distributive.ll (+228)
diff --git a/llvm/include/llvm/IR/Operator.h b/llvm/include/llvm/IR/Operator.h
index f63f54ef94107..ec8b3f4b6318f 100644
--- a/llvm/include/llvm/IR/Operator.h
+++ b/llvm/include/llvm/IR/Operator.h
@@ -123,6 +123,9 @@ class OverflowingBinaryOperator : public Operator {
     return NoWrapKind;
   }
 
+  /// Return true if the instruction is commutative:
+  bool isCommutative() const { return Instruction::isCommutative(getOpcode()); }
+
   static bool classof(const Instruction *I) {
     return I->getOpcode() == Instruction::Add ||
            I->getOpcode() == Instruction::Sub ||
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
index cc68fd4cf1c1b..8944eec2d63d4 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
@@ -1505,6 +1505,97 @@ foldMinimumOverTrailingOrLeadingZeroCount(Value *I0, Value *I1,
       ConstantInt::getTrue(ZeroUndef->getType()));
 }
 
+/// Return whether "X LOp (Y ROp Z)" is always equal to
+/// "(X LOp Y) ROp (X LOp Z)".
+static bool leftDistributesOverRightIntrinsic(Instruction::BinaryOps LOp,
+                                              bool hasNUW, bool hasNSW,
+                                              Intrinsic::ID ROp) {
+  switch (ROp) {
+  case Intrinsic::umax:
+    return hasNUW && LOp == Instruction::Add;
+  case Intrinsic::umin:
+    return hasNUW && LOp == Instruction::Add;
+  case Intrinsic::smax:
+    return hasNSW && LOp == Instruction::Add;
+  case Intrinsic::smin:
+    return hasNSW && LOp == Instruction::Add;
+  default:
+    return false;
+  }
+}
+
+// Attempts to factorise a common term
+// in an instruction that has the form "(A op' B) op (C op' D)
+// where op is an intrinsic and op' is a binop
+static Value *
+foldIntrinsicUsingDistributiveLaws(IntrinsicInst *II,
+                                   InstCombiner::BuilderTy &Builder) {
+  Value *LHS = II->getOperand(0), *RHS = II->getOperand(1);
+  Intrinsic::ID TopLevelOpcode = II->getIntrinsicID();
+
+  OverflowingBinaryOperator *Op0 = dyn_cast<OverflowingBinaryOperator>(LHS);
+  OverflowingBinaryOperator *Op1 = dyn_cast<OverflowingBinaryOperator>(RHS);
+
+  if (!Op0 || !Op1)
+    return nullptr;
+
+  if (Op0->getOpcode() != Op1->getOpcode())
+    return nullptr;
+
+  if (!(Op0->hasNoUnsignedWrap() == Op1->hasNoUnsignedWrap()) ||
+      !(Op0->hasNoSignedWrap() == Op1->hasNoSignedWrap()))
+    return nullptr;
+
+  if (!Op0->hasOneUse() || !Op1->hasOneUse())
+    return nullptr;
+
+  Instruction::BinaryOps InnerOpcode =
+      static_cast<Instruction::BinaryOps>(Op0->getOpcode());
+  bool HasNUW = Op0->hasNoUnsignedWrap();
+  bool HasNSW = Op0->hasNoSignedWrap();
+
+  if (!InnerOpcode)
+    return nullptr;
+
+  if (!leftDistributesOverRightIntrinsic(InnerOpcode, HasNUW, HasNSW,
+                                         TopLevelOpcode))
+    return nullptr;
+
+  assert(II->isCommutative() && Op0->isCommutative() &&
+         "Only inner and outer commutative op codes are supported.");
+
+  Value *A = Op0->getOperand(0);
+  Value *B = Op0->getOperand(1);
+  Value *C = Op1->getOperand(0);
+  Value *D = Op1->getOperand(1);
+
+  if (A == C || A == D) {
+    if (A != C)
+      std::swap(C, D);
+
+    Value *NewIntrinsic = Builder.CreateBinaryIntrinsic(TopLevelOpcode, B, D);
+    BinaryOperator *NewBinop =
+        cast<BinaryOperator>(Builder.CreateBinOp(InnerOpcode, NewIntrinsic, A));
+    NewBinop->setHasNoSignedWrap(HasNSW);
+    NewBinop->setHasNoUnsignedWrap(HasNUW);
+    return NewBinop;
+  }
+
+  if (B == D || B == C) {
+    if (B != D)
+      std::swap(C, D);
+
+    Value *NewIntrinsic = Builder.CreateBinaryIntrinsic(TopLevelOpcode, A, C);
+    BinaryOperator *NewBinop =
+        cast<BinaryOperator>(Builder.CreateBinOp(InnerOpcode, NewIntrinsic, B));
+    NewBinop->setHasNoSignedWrap(HasNSW);
+    NewBinop->setHasNoUnsignedWrap(HasNUW);
+    return NewBinop;
+  }
+
+  return nullptr;
+}
+
 /// CallInst simplification. This mostly only handles folding of intrinsic
 /// instructions. For normal calls, it allows visitCallBase to do the heavy
 /// lifting.
@@ -1929,6 +2020,9 @@ Instruction *InstCombinerImpl::visitCallInst(CallInst &CI) {
       }
     }
 
+    if (Value *V = foldIntrinsicUsingDistributiveLaws(II, Builder))
+      return replaceInstUsesWith(*II, V);
+
     break;
   }
   case Intrinsic::bitreverse: {
diff --git a/llvm/test/Transforms/InstCombine/intrinsic-distributive.ll b/llvm/test/Transforms/InstCombine/intrinsic-distributive.ll
new file mode 100644
index 0000000000000..f58ce04cb6711
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/intrinsic-distributive.ll
@@ -0,0 +1,228 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -passes=instcombine < %s 2>&1 | FileCheck %s
+
+
+define i8 @umax_of_add_nuw(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @umax_of_add_nuw(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call i8 @llvm.umax.i8(i8 [[B]], i8 [[C]])
+; CHECK-NEXT:    [[MAX:%.*]] = add nuw i8 [[TMP1]], [[A]]
+; CHECK-NEXT:    ret i8 [[MAX]]
+;
+  %add1 = add nuw i8 %b, %a
+  %add2 = add nuw i8 %c, %a
+  %max = call i8 @llvm.umax.i8(i8 %add1, i8 %add2)
+  ret i8 %max
+}
+
+define i8 @umax_of_add_nuw_comm(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @umax_of_add_nuw_comm(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call i8 @llvm.umax.i8(i8 [[B]], i8 [[C]])
+; CHECK-NEXT:    [[MAX:%.*]] = add nuw i8 [[TMP1]], [[A]]
+; CHECK-NEXT:    ret i8 [[MAX]]
+;
+  %add1 = add nuw i8 %a, %b
+  %add2 = add nuw i8 %a, %c
+  %max = call i8 @llvm.umax.i8(i8 %add1, i8 %add2)
+  ret i8 %max
+}
+
+
+; negative test
+define i8 @umax_of_add_nsw(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @umax_of_add_nsw(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[ADD1:%.*]] = add nsw i8 [[B]], [[A]]
+; CHECK-NEXT:    [[ADD2:%.*]] = add nsw i8 [[C]], [[A]]
+; CHECK-NEXT:    [[MAX:%.*]] = call i8 @llvm.umax.i8(i8 [[ADD1]], i8 [[ADD2]])
+; CHECK-NEXT:    ret i8 [[MAX]]
+;
+  %add1 = add nsw i8 %b, %a
+  %add2 = add nsw i8 %c, %a
+  %max = call i8 @llvm.umax.i8(i8 %add1, i8 %add2)
+  ret i8 %max
+}
+
+; negative test
+define i8 @umax_of_add(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @umax_of_add(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[ADD1:%.*]] = add i8 [[B]], [[A]]
+; CHECK-NEXT:    [[ADD2:%.*]] = add i8 [[C]], [[A]]
+; CHECK-NEXT:    [[MAX:%.*]] = call i8 @llvm.umax.i8(i8 [[ADD1]], i8 [[ADD2]])
+; CHECK-NEXT:    ret i8 [[MAX]]
+;
+  %add1 = add i8 %b, %a
+  %add2 = add i8 %c, %a
+  %max = call i8 @llvm.umax.i8(i8 %add1, i8 %add2)
+  ret i8 %max
+}
+
+define i8 @umin_of_add_nuw(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @umin_of_add_nuw(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call i8 @llvm.umin.i8(i8 [[B]], i8 [[C]])
+; CHECK-NEXT:    [[MIN:%.*]] = add nuw i8 [[TMP1]], [[A]]
+; CHECK-NEXT:    ret i8 [[MIN]]
+;
+  %add1 = add nuw i8 %b, %a
+  %add2 = add nuw i8 %c, %a
+  %min = call i8 @llvm.umin.i8(i8 %add1, i8 %add2)
+  ret i8 %min
+}
+
+define i8 @umin_of_add_nuw_comm(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @umin_of_add_nuw_comm(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call i8 @llvm.umin.i8(i8 [[B]], i8 [[C]])
+; CHECK-NEXT:    [[MIN:%.*]] = add nuw i8 [[TMP1]], [[A]]
+; CHECK-NEXT:    ret i8 [[MIN]]
+;
+  %add1 = add nuw i8 %a, %b
+  %add2 = add nuw i8 %a, %c
+  %min = call i8 @llvm.umin.i8(i8 %add1, i8 %add2)
+  ret i8 %min
+}
+
+; negative test
+define i8 @umin_of_add_nsw(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @umin_of_add_nsw(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[ADD1:%.*]] = add nsw i8 [[B]], [[A]]
+; CHECK-NEXT:    [[ADD2:%.*]] = add nsw i8 [[C]], [[A]]
+; CHECK-NEXT:    [[MIN:%.*]] = call i8 @llvm.umin.i8(i8 [[ADD1]], i8 [[ADD2]])
+; CHECK-NEXT:    ret i8 [[MIN]]
+;
+  %add1 = add nsw i8 %b, %a
+  %add2 = add nsw i8 %c, %a
+  %min = call i8 @llvm.umin.i8(i8 %add1, i8 %add2)
+  ret i8 %min
+}
+
+; negative test
+define i8 @umin_of_add(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @umin_of_add(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[ADD1:%.*]] = add i8 [[B]], [[A]]
+; CHECK-NEXT:    [[ADD2:%.*]] = add i8 [[C]], [[A]]
+; CHECK-NEXT:    [[MIN:%.*]] = call i8 @llvm.umin.i8(i8 [[ADD1]], i8 [[ADD2]])
+; CHECK-NEXT:    ret i8 [[MIN]]
+;
+  %add1 = add i8 %b, %a
+  %add2 = add i8 %c, %a
+  %min = call i8 @llvm.umin.i8(i8 %add1, i8 %add2)
+  ret i8 %min
+}
+
+; negative test
+define i8 @smax_of_add_nuw(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @smax_of_add_nuw(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[ADD1:%.*]] = add nuw i8 [[B]], [[A]]
+; CHECK-NEXT:    [[ADD2:%.*]] = add nuw i8 [[C]], [[A]]
+; CHECK-NEXT:    [[MAX:%.*]] = call i8 @llvm.smax.i8(i8 [[ADD1]], i8 [[ADD2]])
+; CHECK-NEXT:    ret i8 [[MAX]]
+;
+  %add1 = add nuw i8 %b, %a
+  %add2 = add nuw i8 %c, %a
+  %max = call i8 @llvm.smax.i8(i8 %add1, i8 %add2)
+  ret i8 %max
+}
+
+define i8 @smax_of_add_nsw(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @smax_of_add_nsw(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call i8 @llvm.smax.i8(i8 [[B]], i8 [[C]])
+; CHECK-NEXT:    [[MAX:%.*]] = add nsw i8 [[TMP1]], [[A]]
+; CHECK-NEXT:    ret i8 [[MAX]]
+;
+  %add1 = add nsw i8 %b, %a
+  %add2 = add nsw i8 %c, %a
+  %max = call i8 @llvm.smax.i8(i8 %add1, i8 %add2)
+  ret i8 %max
+}
+
+define i8 @smax_of_add_nsw_comm(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @smax_of_add_nsw_comm(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call i8 @llvm.smax.i8(i8 [[B]], i8 [[C]])
+; CHECK-NEXT:    [[MAX:%.*]] = add nsw i8 [[TMP1]], [[A]]
+; CHECK-NEXT:    ret i8 [[MAX]]
+;
+  %add1 = add nsw i8 %a, %b
+  %add2 = add nsw i8 %a, %c
+  %max = call i8 @llvm.smax.i8(i8 %add1, i8 %add2)
+  ret i8 %max
+}
+
+; negative test
+define i8 @smax_of_add(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @smax_of_add(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[ADD1:%.*]] = add i8 [[B]], [[A]]
+; CHECK-NEXT:    [[ADD2:%.*]] = add i8 [[C]], [[A]]
+; CHECK-NEXT:    [[MAX:%.*]] = call i8 @llvm.smax.i8(i8 [[ADD1]], i8 [[ADD2]])
+; CHECK-NEXT:    ret i8 [[MAX]]
+;
+  %add1 = add i8 %b, %a
+  %add2 = add i8 %c, %a
+  %max = call i8 @llvm.smax.i8(i8 %add1, i8 %add2)
+  ret i8 %max
+}
+
+; negative test
+define i8 @smin_of_add_nuw(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @smin_of_add_nuw(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[ADD1:%.*]] = add nuw i8 [[B]], [[A]]
+; CHECK-NEXT:    [[ADD2:%.*]] = add nuw i8 [[C]], [[A]]
+; CHECK-NEXT:    [[MIN:%.*]] = call i8 @llvm.smin.i8(i8 [[ADD1]], i8 [[ADD2]])
+; CHECK-NEXT:    ret i8 [[MIN]]
+;
+  %add1 = add nuw i8 %b, %a
+  %add2 = add nuw i8 %c, %a
+  %min = call i8 @llvm.smin.i8(i8 %add1, i8 %add2)
+  ret i8 %min
+}
+
+define i8 @smin_of_add_nsw(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @smin_of_add_nsw(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call i8 @llvm.smin.i8(i8 [[B]], i8 [[C]])
+; CHECK-NEXT:    [[MIN:%.*]] = add nsw i8 [[TMP1]], [[A]]
+; CHECK-NEXT:    ret i8 [[MIN]]
+;
+  %add1 = add nsw i8 %b, %a
+  %add2 = add nsw i8 %c, %a
+  %min = call i8 @llvm.smin.i8(i8 %add1, i8 %add2)
+  ret i8 %min
+}
+
+define i8 @smin_of_add_nsw_comm(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @smin_of_add_nsw_comm(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call i8 @llvm.smin.i8(i8 [[B]], i8 [[C]])
+; CHECK-NEXT:    [[MIN:%.*]] = add nsw i8 [[TMP1]], [[A]]
+; CHECK-NEXT:    ret i8 [[MIN]]
+;
+  %add1 = add nsw i8 %a, %b
+  %add2 = add nsw i8 %a, %c
+  %min = call i8 @llvm.smin.i8(i8 %add1, i8 %add2)
+  ret i8 %min
+}
+
+; negative test
+define i8 @smin_of_add(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @smin_of_add(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[ADD1:%.*]] = add i8 [[B]], [[A]]
+; CHECK-NEXT:    [[ADD2:%.*]] = add i8 [[C]], [[A]]
+; CHECK-NEXT:    [[MIN:%.*]] = call i8 @llvm.smin.i8(i8 [[ADD1]], i8 [[ADD2]])
+; CHECK-NEXT:    ret i8 [[MIN]]
+;
+  %add1 = add i8 %b, %a
+  %add2 = add i8 %c, %a
+  %min = call i8 @llvm.smin.i8(i8 %add1, i8 %add2)
+  ret i8 %min
+}

@jf-botto
Copy link
Contributor Author

jf-botto commented Aug 2, 2024

@dtcxzyw Here's the PR with a fix for the 4 cases you mention in that comment.

cast<BinaryOperator>(Builder.CreateBinOp(InnerOpcode, NewIntrinsic, B));
NewBinop->setHasNoSignedWrap(HasNSW);
NewBinop->setHasNoUnsignedWrap(HasNUW);
return NewBinop;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think the return code has become complex enough to warrant updating the detection logic to:

if(A != C && A != D)
   std::swap(A, B);
if (B == D || B == C)
   ....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, totally get it. I've simplified the boolean logic into fewer/simpler arguments.

Comment on lines 1545 to 1555
if (Op0->hasNoUnsignedWrap() != Op1->hasNoUnsignedWrap() ||
Op0->hasNoSignedWrap() != Op1->hasNoSignedWrap())
return nullptr;

if (!Op0->hasOneUse() || !Op1->hasOneUse())
return nullptr;

Instruction::BinaryOps InnerOpcode =
static_cast<Instruction::BinaryOps>(Op0->getOpcode());
bool HasNUW = Op0->hasNoUnsignedWrap();
bool HasNSW = Op0->hasNoSignedWrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (Op0->hasNoUnsignedWrap() != Op1->hasNoUnsignedWrap() ||
Op0->hasNoSignedWrap() != Op1->hasNoSignedWrap())
return nullptr;
if (!Op0->hasOneUse() || !Op1->hasOneUse())
return nullptr;
Instruction::BinaryOps InnerOpcode =
static_cast<Instruction::BinaryOps>(Op0->getOpcode());
bool HasNUW = Op0->hasNoUnsignedWrap();
bool HasNSW = Op0->hasNoSignedWrap();
if (!Op0->hasOneUse() || !Op1->hasOneUse())
return nullptr;
Instruction::BinaryOps InnerOpcode =
static_cast<Instruction::BinaryOps>(Op0->getOpcode());
bool HasNUW = Op0->hasNoUnsignedWrap() && Op1->hasNoUnsignedWrap();
bool HasNSW = Op0->hasNoSignedWrap() && Op1->hasNoUnsignedWrap();

It is too strict. Please add a test for smin((add nuw nsw X, Y), (add nsw X, Z)).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Added the test for smin and the other 3 intrinsics with various flag combinations.

if (A != C) {
std::swap(C, D);

if (A != D)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is confusing. A != D here always evaluates to true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rewritten the logic in a clearer way.

; CHECK-NEXT: [[MAX:%.*]] = add nuw i8 [[TMP1]], [[A]]
; CHECK-NEXT: ret i8 [[MAX]]
;
%add1 = add nuw i8 %a, %b
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need a trick to avoid complexity-based canonicalization :)
https://llvm.org/docs/InstCombineContributorGuide.html#add-commuted-tests

@jf-botto jf-botto force-pushed the 92433-1 branch 2 times, most recently from 4f9908a to 656f78f Compare August 3, 2024 16:11
}

if (B == D || B == C)
std::swap(A, B);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think you need to still swap this and the above A != C check.

I would just rewrite all the matching as:

if(A != C && A != D)
  std::swap(A, B);
if (A == C || A == D) {
  if (A != C)
    std::swap(C, D);
  // Return NewBinop
} 
return nullptr;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Now I know what you meant before. Thanks.

Comment on lines 1553 to 1554
if (!InnerOpcode)
return nullptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!InnerOpcode)
return nullptr;

It is just a noop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

@jf-botto
Copy link
Contributor Author

ping

// Attempts to swap variables such that A always equals C
if (A != C && A != D)
std::swap(A, B);
if (A == C || A == D) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: I'm just curious but does this work for constant As/Cs or splat vectors? For example,

define i8 @f(i8 %x, i8 %y) {
    %add1 = add nuw i8 %x, 42
    %add2 = add nuw i8 %y, 42
    %umin = call i8 @llvm.umin.i8(i8 %add1, i8 %add2)
    ret i8 %umin
}

and

define <4 x i8> @src(<4 x i8> %x, <4 x i8> %y) {
    %add1 = add nuw <4 x i8> %x, <i8 42, i8 42, i8 42, i8 42>
    %add2 = add nuw <4 x i8> %y, <i8 42, i8 42, i8 42, i8 42>
    %umin = call <4 x i8> @llvm.umin.v4i8(<4 x i8> %add1, <4 x i8> %add2)
    ret <4 x i8> %umin
}

It might be a good idea to add such a test to the precommitted tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would work for constants/splats because the optimisation itself doesn't distinguish between different types of operands. Sure. Will add a test.

case Intrinsic::umax:
return hasNUW && LOp == Instruction::Add;
case Intrinsic::umin:
return hasNUW && LOp == Instruction::Add;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can combine these cases since there are no functions that distribute over umax but not umin (or vice versa).

  case Intrinsic::umax:
  case Intrinsic::umin:
    return hasNUW && LOp == Instruction::Add;

Proof sketch: Let f be an arbitrary binary function and x, y, z be arbitrary bit vectors. Suppose that (f u (umax v w)) = (umax (f u v) (f u w)) for all u, v, w. Observe that (umin u v) = (xor u v (umax u v)) for all u, v. Then (umin (f x y) (f x z)) = (xor (f x y) (f x z) (umax (f x y) (f x z))) = (xor (f x y) (f x z) (f x (umax y z))). The case (f x y) = (f x z) is trivial, hence suppose they are not equal. Then (f x (umax y z)) is equal to either (f x y) or (f x z), leaving the other as the result of the xor, which equals (f x (umin y z)), as required.

(Similar for smin/smax.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely understand. Fixed.

@@ -1505,6 +1505,80 @@ foldMinimumOverTrailingOrLeadingZeroCount(Value *I0, Value *I1,
ConstantInt::getTrue(ZeroUndef->getType()));
}

/// Return whether "X LOp (Y ROp Z)" is always equal to
/// "(X LOp Y) ROp (X LOp Z)".
static bool foldIntrinsicUsingDistributiveLaws(Instruction::BinaryOps LOp,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, the name of the function is a bit misleading since it doesn't fold anything but rather checks whether we can apply the transformation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. 100% agree. Fixed.

Copy link

github-actions bot commented Aug 19, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@goldsteinn goldsteinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a test nit.

; CHECK-NEXT: [[MAX:%.*]] = add nuw i8 [[TMP1]], [[A]]
; CHECK-NEXT: ret i8 [[MAX]]
;
%add1 = add nuw i8 %b, %a ; thwart complexity-based canonicalization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These "thwart" comments don't make sense to me -- the add here is part of the folded pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed them.

@jf-botto
Copy link
Contributor Author

LGTM with a test nit.

Thank you @nikic. Much appreciated it. Would you mind merging it as I don't have write access?

Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!

@nikic nikic merged commit fcd51de into llvm:main Nov 2, 2024
8 checks passed
Copy link

github-actions bot commented Nov 2, 2024

@jf-botto Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 2, 2024

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-fast running on sanitizer-buildbot4 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/169/builds/4907

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 86840 of 86841 tests, 88 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.
TIMEOUT: Clangd Unit Tests :: ./ClangdTests/25/158 (8635 of 86840)
******************** TEST 'Clangd Unit Tests :: ./ClangdTests/25/158' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/tools/extra/clangd/unittests/./ClangdTests-Clangd Unit Tests-2537028-25-158.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=158 GTEST_SHARD_INDEX=25 /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/tools/extra/clangd/unittests/./ClangdTests
--

Note: This is test shard 26 of 158.
[==========] Running 8 tests from 8 test suites.
[----------] Global test environment set-up.
[----------] 1 test from BackgroundQueueTest
[ RUN      ] BackgroundQueueTest.Boost
[       OK ] BackgroundQueueTest.Boost (0 ms)
[----------] 1 test from BackgroundQueueTest (0 ms total)

[----------] 1 test from CompletionTest
[ RUN      ] CompletionTest.NamespaceDoubleInsertion
Built preamble of size 215784 for file /clangd-test/foo.cpp version null in 9.79 seconds
Ignored diagnostic. /clangd-test/foo.cpp:5:5:expected unqualified-id
Sema said no scope specifier, but we saw ns:: in the source code
Code complete: fuzzyFind({
  "AnyScope": false,
  "Limit": null,
  "PreferredTypes": [],
  "ProximityPaths": [
    "/clangd-test/foo.cpp"
  ],
  "Query": "ABC",
  "RestrictForCodeCompletion": true,
  "Scopes": [
    "foo::ns::",
    "ns::"
  ]
})
Code complete: sema context TopLevel, query scopes [foo::ns::,ns::] (AnyScope=false), expected type <none>
Code complete: 0 results from Sema, 1 from Index, 0 matched, 0 from identifiers, 1 returned.
[       OK ] CompletionTest.NamespaceDoubleInsertion (11670 ms)
[----------] 1 test from CompletionTest (11670 ms total)

[----------] 1 test from DexTest
Step 10 (stage2/asan_ubsan check) failure: stage2/asan_ubsan check (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 86840 of 86841 tests, 88 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.
TIMEOUT: Clangd Unit Tests :: ./ClangdTests/25/158 (8635 of 86840)
******************** TEST 'Clangd Unit Tests :: ./ClangdTests/25/158' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/tools/extra/clangd/unittests/./ClangdTests-Clangd Unit Tests-2537028-25-158.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=158 GTEST_SHARD_INDEX=25 /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/tools/extra/clangd/unittests/./ClangdTests
--

Note: This is test shard 26 of 158.
[==========] Running 8 tests from 8 test suites.
[----------] Global test environment set-up.
[----------] 1 test from BackgroundQueueTest
[ RUN      ] BackgroundQueueTest.Boost
[       OK ] BackgroundQueueTest.Boost (0 ms)
[----------] 1 test from BackgroundQueueTest (0 ms total)

[----------] 1 test from CompletionTest
[ RUN      ] CompletionTest.NamespaceDoubleInsertion
Built preamble of size 215784 for file /clangd-test/foo.cpp version null in 9.79 seconds
Ignored diagnostic. /clangd-test/foo.cpp:5:5:expected unqualified-id
Sema said no scope specifier, but we saw ns:: in the source code
Code complete: fuzzyFind({
  "AnyScope": false,
  "Limit": null,
  "PreferredTypes": [],
  "ProximityPaths": [
    "/clangd-test/foo.cpp"
  ],
  "Query": "ABC",
  "RestrictForCodeCompletion": true,
  "Scopes": [
    "foo::ns::",
    "ns::"
  ]
})
Code complete: sema context TopLevel, query scopes [foo::ns::,ns::] (AnyScope=false), expected type <none>
Code complete: 0 results from Sema, 1 from Index, 0 matched, 0 from identifiers, 1 returned.
[       OK ] CompletionTest.NamespaceDoubleInsertion (11670 ms)
[----------] 1 test from CompletionTest (11670 ms total)

[----------] 1 test from DexTest

smallp-o-p pushed a commit to smallp-o-p/llvm-project that referenced this pull request Nov 3, 2024
…1717)

This PR fixes part of llvm#92433.

It specifically adds the 4 cases mentioned in
llvm#92433 (comment).

I've added 8 positive tests, 4 of which are mentioned in the comment
above and 4 which are their commutative equivalents. Alive proof:
https://alive2.llvm.org/ce/z/z6eFTb
I've also added 8 negative tests, because we want to make sure we do not
optimise if the relevant flags are not relevant because the optimisation
wouldn't be sound. Alive proof that the optimisation is invalid:
https://alive2.llvm.org/ce/z/NvNjTD
I did have to make the integer types `i4` to make Alive not timeout and
to fit them all on one page.
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
…1717)

This PR fixes part of llvm#92433.

It specifically adds the 4 cases mentioned in
llvm#92433 (comment).

I've added 8 positive tests, 4 of which are mentioned in the comment
above and 4 which are their commutative equivalents. Alive proof:
https://alive2.llvm.org/ce/z/z6eFTb
I've also added 8 negative tests, because we want to make sure we do not
optimise if the relevant flags are not relevant because the optimisation
wouldn't be sound. Alive proof that the optimisation is invalid:
https://alive2.llvm.org/ce/z/NvNjTD
I did have to make the integer types `i4` to make Alive not timeout and
to fit them all on one page.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants