[InstCombine] Factorise Add and Min/Max using Distributivity #101717

jf-botto · 2024-08-02T17:24:22Z

This PR fixes part of #92433.

It specifically adds the 4 cases mentioned in #92433 (comment).

I've added 8 positive tests, 4 of which are mentioned in the comment above and 4 which are their commutative equivalents. Alive proof: https://alive2.llvm.org/ce/z/z6eFTb
I've also added 8 negative tests, because we want to make sure we do not optimise if the relevant flags are not relevant because the optimisation wouldn't be sound. Alive proof that the optimisation is invalid: https://alive2.llvm.org/ce/z/NvNjTD
I did have to make the integer types i4 to make Alive not timeout and to fit them all on one page.

github-actions · 2024-08-02T17:24:42Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2024-08-02T17:25:10Z

@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-llvm-transforms

Author: Jorge Botto (jf-botto)

Changes

This PR fixes part of #92433.

It specifically adds the 4 cases mentioned in #92433 (comment).

I've added 8 positive tests, 4 of which are mentioned in the comment above and 4 which are their commutative equivalents. Alive proof: https://alive2.llvm.org/ce/z/z6eFTb
I've also added 8 negative tests, because we want to make sure we do not optimise if the relevant flags are not relevant because the optimisation wouldn't be sound. Alive proof: https://alive2.llvm.org/ce/z/NvNjTD
I did have to make the integer types i4 to make Alive not timeout and to fit them all on one page.

Full diff: https://github.com/llvm/llvm-project/pull/101717.diff

3 Files Affected:

(modified) llvm/include/llvm/IR/Operator.h (+3)
(modified) llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp (+94)
(added) llvm/test/Transforms/InstCombine/intrinsic-distributive.ll (+228)

diff --git a/llvm/include/llvm/IR/Operator.h b/llvm/include/llvm/IR/Operator.h
index f63f54ef94107..ec8b3f4b6318f 100644
--- a/llvm/include/llvm/IR/Operator.h
+++ b/llvm/include/llvm/IR/Operator.h
@@ -123,6 +123,9 @@ class OverflowingBinaryOperator : public Operator {
     return NoWrapKind;
   }
 
+  /// Return true if the instruction is commutative:
+  bool isCommutative() const { return Instruction::isCommutative(getOpcode()); }
+
   static bool classof(const Instruction *I) {
     return I->getOpcode() == Instruction::Add ||
            I->getOpcode() == Instruction::Sub ||
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
index cc68fd4cf1c1b..8944eec2d63d4 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
@@ -1505,6 +1505,97 @@ foldMinimumOverTrailingOrLeadingZeroCount(Value *I0, Value *I1,
       ConstantInt::getTrue(ZeroUndef->getType()));
 }
 
+/// Return whether "X LOp (Y ROp Z)" is always equal to
+/// "(X LOp Y) ROp (X LOp Z)".
+static bool leftDistributesOverRightIntrinsic(Instruction::BinaryOps LOp,
+                                              bool hasNUW, bool hasNSW,
+                                              Intrinsic::ID ROp) {
+  switch (ROp) {
+  case Intrinsic::umax:
+    return hasNUW && LOp == Instruction::Add;
+  case Intrinsic::umin:
+    return hasNUW && LOp == Instruction::Add;
+  case Intrinsic::smax:
+    return hasNSW && LOp == Instruction::Add;
+  case Intrinsic::smin:
+    return hasNSW && LOp == Instruction::Add;
+  default:
+    return false;
+  }
+}
+
+// Attempts to factorise a common term
+// in an instruction that has the form "(A op' B) op (C op' D)
+// where op is an intrinsic and op' is a binop
+static Value *
+foldIntrinsicUsingDistributiveLaws(IntrinsicInst *II,
+                                   InstCombiner::BuilderTy &Builder) {
+  Value *LHS = II->getOperand(0), *RHS = II->getOperand(1);
+  Intrinsic::ID TopLevelOpcode = II->getIntrinsicID();
+
+  OverflowingBinaryOperator *Op0 = dyn_cast<OverflowingBinaryOperator>(LHS);
+  OverflowingBinaryOperator *Op1 = dyn_cast<OverflowingBinaryOperator>(RHS);
+
+  if (!Op0 || !Op1)
+    return nullptr;
+
+  if (Op0->getOpcode() != Op1->getOpcode())
+    return nullptr;
+
+  if (!(Op0->hasNoUnsignedWrap() == Op1->hasNoUnsignedWrap()) ||
+      !(Op0->hasNoSignedWrap() == Op1->hasNoSignedWrap()))
+    return nullptr;
+
+  if (!Op0->hasOneUse() || !Op1->hasOneUse())
+    return nullptr;
+
+  Instruction::BinaryOps InnerOpcode =
+      static_cast<Instruction::BinaryOps>(Op0->getOpcode());
+  bool HasNUW = Op0->hasNoUnsignedWrap();
+  bool HasNSW = Op0->hasNoSignedWrap();
+
+  if (!InnerOpcode)
+    return nullptr;
+
+  if (!leftDistributesOverRightIntrinsic(InnerOpcode, HasNUW, HasNSW,
+                                         TopLevelOpcode))
+    return nullptr;
+
+  assert(II->isCommutative() && Op0->isCommutative() &&
+         "Only inner and outer commutative op codes are supported.");
+
+  Value *A = Op0->getOperand(0);
+  Value *B = Op0->getOperand(1);
+  Value *C = Op1->getOperand(0);
+  Value *D = Op1->getOperand(1);
+
+  if (A == C || A == D) {
+    if (A != C)
+      std::swap(C, D);
+
+    Value *NewIntrinsic = Builder.CreateBinaryIntrinsic(TopLevelOpcode, B, D);
+    BinaryOperator *NewBinop =
+        cast<BinaryOperator>(Builder.CreateBinOp(InnerOpcode, NewIntrinsic, A));
+    NewBinop->setHasNoSignedWrap(HasNSW);
+    NewBinop->setHasNoUnsignedWrap(HasNUW);
+    return NewBinop;
+  }
+
+  if (B == D || B == C) {
+    if (B != D)
+      std::swap(C, D);
+
+    Value *NewIntrinsic = Builder.CreateBinaryIntrinsic(TopLevelOpcode, A, C);
+    BinaryOperator *NewBinop =
+        cast<BinaryOperator>(Builder.CreateBinOp(InnerOpcode, NewIntrinsic, B));
+    NewBinop->setHasNoSignedWrap(HasNSW);
+    NewBinop->setHasNoUnsignedWrap(HasNUW);
+    return NewBinop;
+  }
+
+  return nullptr;
+}
+
 /// CallInst simplification. This mostly only handles folding of intrinsic
 /// instructions. For normal calls, it allows visitCallBase to do the heavy
 /// lifting.
@@ -1929,6 +2020,9 @@ Instruction *InstCombinerImpl::visitCallInst(CallInst &CI) {
       }
     }
 
+    if (Value *V = foldIntrinsicUsingDistributiveLaws(II, Builder))
+      return replaceInstUsesWith(*II, V);
+
     break;
   }
   case Intrinsic::bitreverse: {
diff --git a/llvm/test/Transforms/InstCombine/intrinsic-distributive.ll b/llvm/test/Transforms/InstCombine/intrinsic-distributive.ll
new file mode 100644
index 0000000000000..f58ce04cb6711
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/intrinsic-distributive.ll
@@ -0,0 +1,228 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -passes=instcombine < %s 2>&1 | FileCheck %s
+
+
+define i8 @umax_of_add_nuw(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @umax_of_add_nuw(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call i8 @llvm.umax.i8(i8 [[B]], i8 [[C]])
+; CHECK-NEXT:    [[MAX:%.*]] = add nuw i8 [[TMP1]], [[A]]
+; CHECK-NEXT:    ret i8 [[MAX]]
+;
+  %add1 = add nuw i8 %b, %a
+  %add2 = add nuw i8 %c, %a
+  %max = call i8 @llvm.umax.i8(i8 %add1, i8 %add2)
+  ret i8 %max
+}
+
+define i8 @umax_of_add_nuw_comm(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @umax_of_add_nuw_comm(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call i8 @llvm.umax.i8(i8 [[B]], i8 [[C]])
+; CHECK-NEXT:    [[MAX:%.*]] = add nuw i8 [[TMP1]], [[A]]
+; CHECK-NEXT:    ret i8 [[MAX]]
+;
+  %add1 = add nuw i8 %a, %b
+  %add2 = add nuw i8 %a, %c
+  %max = call i8 @llvm.umax.i8(i8 %add1, i8 %add2)
+  ret i8 %max
+}
+
+
+; negative test
+define i8 @umax_of_add_nsw(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @umax_of_add_nsw(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[ADD1:%.*]] = add nsw i8 [[B]], [[A]]
+; CHECK-NEXT:    [[ADD2:%.*]] = add nsw i8 [[C]], [[A]]
+; CHECK-NEXT:    [[MAX:%.*]] = call i8 @llvm.umax.i8(i8 [[ADD1]], i8 [[ADD2]])
+; CHECK-NEXT:    ret i8 [[MAX]]
+;
+  %add1 = add nsw i8 %b, %a
+  %add2 = add nsw i8 %c, %a
+  %max = call i8 @llvm.umax.i8(i8 %add1, i8 %add2)
+  ret i8 %max
+}
+
+; negative test
+define i8 @umax_of_add(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @umax_of_add(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[ADD1:%.*]] = add i8 [[B]], [[A]]
+; CHECK-NEXT:    [[ADD2:%.*]] = add i8 [[C]], [[A]]
+; CHECK-NEXT:    [[MAX:%.*]] = call i8 @llvm.umax.i8(i8 [[ADD1]], i8 [[ADD2]])
+; CHECK-NEXT:    ret i8 [[MAX]]
+;
+  %add1 = add i8 %b, %a
+  %add2 = add i8 %c, %a
+  %max = call i8 @llvm.umax.i8(i8 %add1, i8 %add2)
+  ret i8 %max
+}
+
+define i8 @umin_of_add_nuw(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @umin_of_add_nuw(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call i8 @llvm.umin.i8(i8 [[B]], i8 [[C]])
+; CHECK-NEXT:    [[MIN:%.*]] = add nuw i8 [[TMP1]], [[A]]
+; CHECK-NEXT:    ret i8 [[MIN]]
+;
+  %add1 = add nuw i8 %b, %a
+  %add2 = add nuw i8 %c, %a
+  %min = call i8 @llvm.umin.i8(i8 %add1, i8 %add2)
+  ret i8 %min
+}
+
+define i8 @umin_of_add_nuw_comm(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @umin_of_add_nuw_comm(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call i8 @llvm.umin.i8(i8 [[B]], i8 [[C]])
+; CHECK-NEXT:    [[MIN:%.*]] = add nuw i8 [[TMP1]], [[A]]
+; CHECK-NEXT:    ret i8 [[MIN]]
+;
+  %add1 = add nuw i8 %a, %b
+  %add2 = add nuw i8 %a, %c
+  %min = call i8 @llvm.umin.i8(i8 %add1, i8 %add2)
+  ret i8 %min
+}
+
+; negative test
+define i8 @umin_of_add_nsw(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @umin_of_add_nsw(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[ADD1:%.*]] = add nsw i8 [[B]], [[A]]
+; CHECK-NEXT:    [[ADD2:%.*]] = add nsw i8 [[C]], [[A]]
+; CHECK-NEXT:    [[MIN:%.*]] = call i8 @llvm.umin.i8(i8 [[ADD1]], i8 [[ADD2]])
+; CHECK-NEXT:    ret i8 [[MIN]]
+;
+  %add1 = add nsw i8 %b, %a
+  %add2 = add nsw i8 %c, %a
+  %min = call i8 @llvm.umin.i8(i8 %add1, i8 %add2)
+  ret i8 %min
+}
+
+; negative test
+define i8 @umin_of_add(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @umin_of_add(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[ADD1:%.*]] = add i8 [[B]], [[A]]
+; CHECK-NEXT:    [[ADD2:%.*]] = add i8 [[C]], [[A]]
+; CHECK-NEXT:    [[MIN:%.*]] = call i8 @llvm.umin.i8(i8 [[ADD1]], i8 [[ADD2]])
+; CHECK-NEXT:    ret i8 [[MIN]]
+;
+  %add1 = add i8 %b, %a
+  %add2 = add i8 %c, %a
+  %min = call i8 @llvm.umin.i8(i8 %add1, i8 %add2)
+  ret i8 %min
+}
+
+; negative test
+define i8 @smax_of_add_nuw(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @smax_of_add_nuw(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[ADD1:%.*]] = add nuw i8 [[B]], [[A]]
+; CHECK-NEXT:    [[ADD2:%.*]] = add nuw i8 [[C]], [[A]]
+; CHECK-NEXT:    [[MAX:%.*]] = call i8 @llvm.smax.i8(i8 [[ADD1]], i8 [[ADD2]])
+; CHECK-NEXT:    ret i8 [[MAX]]
+;
+  %add1 = add nuw i8 %b, %a
+  %add2 = add nuw i8 %c, %a
+  %max = call i8 @llvm.smax.i8(i8 %add1, i8 %add2)
+  ret i8 %max
+}
+
+define i8 @smax_of_add_nsw(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @smax_of_add_nsw(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call i8 @llvm.smax.i8(i8 [[B]], i8 [[C]])
+; CHECK-NEXT:    [[MAX:%.*]] = add nsw i8 [[TMP1]], [[A]]
+; CHECK-NEXT:    ret i8 [[MAX]]
+;
+  %add1 = add nsw i8 %b, %a
+  %add2 = add nsw i8 %c, %a
+  %max = call i8 @llvm.smax.i8(i8 %add1, i8 %add2)
+  ret i8 %max
+}
+
+define i8 @smax_of_add_nsw_comm(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @smax_of_add_nsw_comm(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call i8 @llvm.smax.i8(i8 [[B]], i8 [[C]])
+; CHECK-NEXT:    [[MAX:%.*]] = add nsw i8 [[TMP1]], [[A]]
+; CHECK-NEXT:    ret i8 [[MAX]]
+;
+  %add1 = add nsw i8 %a, %b
+  %add2 = add nsw i8 %a, %c
+  %max = call i8 @llvm.smax.i8(i8 %add1, i8 %add2)
+  ret i8 %max
+}
+
+; negative test
+define i8 @smax_of_add(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @smax_of_add(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[ADD1:%.*]] = add i8 [[B]], [[A]]
+; CHECK-NEXT:    [[ADD2:%.*]] = add i8 [[C]], [[A]]
+; CHECK-NEXT:    [[MAX:%.*]] = call i8 @llvm.smax.i8(i8 [[ADD1]], i8 [[ADD2]])
+; CHECK-NEXT:    ret i8 [[MAX]]
+;
+  %add1 = add i8 %b, %a
+  %add2 = add i8 %c, %a
+  %max = call i8 @llvm.smax.i8(i8 %add1, i8 %add2)
+  ret i8 %max
+}
+
+; negative test
+define i8 @smin_of_add_nuw(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @smin_of_add_nuw(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[ADD1:%.*]] = add nuw i8 [[B]], [[A]]
+; CHECK-NEXT:    [[ADD2:%.*]] = add nuw i8 [[C]], [[A]]
+; CHECK-NEXT:    [[MIN:%.*]] = call i8 @llvm.smin.i8(i8 [[ADD1]], i8 [[ADD2]])
+; CHECK-NEXT:    ret i8 [[MIN]]
+;
+  %add1 = add nuw i8 %b, %a
+  %add2 = add nuw i8 %c, %a
+  %min = call i8 @llvm.smin.i8(i8 %add1, i8 %add2)
+  ret i8 %min
+}
+
+define i8 @smin_of_add_nsw(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @smin_of_add_nsw(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call i8 @llvm.smin.i8(i8 [[B]], i8 [[C]])
+; CHECK-NEXT:    [[MIN:%.*]] = add nsw i8 [[TMP1]], [[A]]
+; CHECK-NEXT:    ret i8 [[MIN]]
+;
+  %add1 = add nsw i8 %b, %a
+  %add2 = add nsw i8 %c, %a
+  %min = call i8 @llvm.smin.i8(i8 %add1, i8 %add2)
+  ret i8 %min
+}
+
+define i8 @smin_of_add_nsw_comm(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @smin_of_add_nsw_comm(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call i8 @llvm.smin.i8(i8 [[B]], i8 [[C]])
+; CHECK-NEXT:    [[MIN:%.*]] = add nsw i8 [[TMP1]], [[A]]
+; CHECK-NEXT:    ret i8 [[MIN]]
+;
+  %add1 = add nsw i8 %a, %b
+  %add2 = add nsw i8 %a, %c
+  %min = call i8 @llvm.smin.i8(i8 %add1, i8 %add2)
+  ret i8 %min
+}
+
+; negative test
+define i8 @smin_of_add(i8 %a, i8 %b, i8 %c) {
+; CHECK-LABEL: define i8 @smin_of_add(
+; CHECK-SAME: i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]]) {
+; CHECK-NEXT:    [[ADD1:%.*]] = add i8 [[B]], [[A]]
+; CHECK-NEXT:    [[ADD2:%.*]] = add i8 [[C]], [[A]]
+; CHECK-NEXT:    [[MIN:%.*]] = call i8 @llvm.smin.i8(i8 [[ADD1]], i8 [[ADD2]])
+; CHECK-NEXT:    ret i8 [[MIN]]
+;
+  %add1 = add i8 %b, %a
+  %add2 = add i8 %c, %a
+  %min = call i8 @llvm.smin.i8(i8 %add1, i8 %add2)
+  ret i8 %min
+}

jf-botto · 2024-08-02T17:48:27Z

@dtcxzyw Here's the PR with a fix for the 4 cases you mention in that comment.

goldsteinn · 2024-08-02T18:07:11Z

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

+        cast<BinaryOperator>(Builder.CreateBinOp(InnerOpcode, NewIntrinsic, B));
+    NewBinop->setHasNoSignedWrap(HasNSW);
+    NewBinop->setHasNoUnsignedWrap(HasNUW);
+    return NewBinop;


Think the return code has become complex enough to warrant updating the detection logic to:

if(A != C && A != D) std::swap(A, B); if (B == D || B == C) ....

Thanks, totally get it. I've simplified the boolean logic into fewer/simpler arguments.

dtcxzyw · 2024-08-03T08:21:24Z

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

+  if (Op0->hasNoUnsignedWrap() != Op1->hasNoUnsignedWrap() ||
+      Op0->hasNoSignedWrap() != Op1->hasNoSignedWrap())
+    return nullptr;
+
+  if (!Op0->hasOneUse() || !Op1->hasOneUse())
+    return nullptr;
+
+  Instruction::BinaryOps InnerOpcode =
+      static_cast<Instruction::BinaryOps>(Op0->getOpcode());
+  bool HasNUW = Op0->hasNoUnsignedWrap();
+  bool HasNSW = Op0->hasNoSignedWrap();


Suggested change

if (Op0->hasNoUnsignedWrap() != Op1->hasNoUnsignedWrap() ||

Op0->hasNoSignedWrap() != Op1->hasNoSignedWrap())

return nullptr;

if (!Op0->hasOneUse() || !Op1->hasOneUse())

return nullptr;

Instruction::BinaryOps InnerOpcode =

static_cast<Instruction::BinaryOps>(Op0->getOpcode());

bool HasNUW = Op0->hasNoUnsignedWrap();

bool HasNSW = Op0->hasNoSignedWrap();

if (!Op0->hasOneUse() || !Op1->hasOneUse())

return nullptr;

Instruction::BinaryOps InnerOpcode =

static_cast<Instruction::BinaryOps>(Op0->getOpcode());

bool HasNUW = Op0->hasNoUnsignedWrap() && Op1->hasNoUnsignedWrap();

bool HasNSW = Op0->hasNoSignedWrap() && Op1->hasNoUnsignedWrap();

It is too strict. Please add a test for smin((add nuw nsw X, Y), (add nsw X, Z)).

Thank you. Added the test for smin and the other 3 intrinsics with various flag combinations.

dtcxzyw · 2024-08-03T08:22:42Z

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

+    if (A != C) {
+      std::swap(C, D);
+
+      if (A != D)


It is confusing. A != D here always evaluates to true.

I've rewritten the logic in a clearer way.

dtcxzyw · 2024-08-03T08:25:29Z

llvm/test/Transforms/InstCombine/intrinsic-distributive.ll

+; CHECK-NEXT:    [[MAX:%.*]] = add nuw i8 [[TMP1]], [[A]]
+; CHECK-NEXT:    ret i8 [[MAX]]
+;
+  %add1 = add nuw i8 %a, %b


You need a trick to avoid complexity-based canonicalization :)
https://llvm.org/docs/InstCombineContributorGuide.html#add-commuted-tests

goldsteinn · 2024-08-03T18:59:33Z

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

+    }
+
+     if (B == D || B == C)
+        std::swap(A, B);


Think you need to still swap this and the above A != C check.

I would just rewrite all the matching as:

if(A != C && A != D) std::swap(A, B); if (A == C || A == D) { if (A != C) std::swap(C, D); // Return NewBinop } return nullptr;

Ah! Now I know what you meant before. Thanks.

dtcxzyw · 2024-08-03T19:21:42Z

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

+  if (!InnerOpcode)
+    return nullptr;


Suggested change

if (!InnerOpcode)

return nullptr;

It is just a noop.

jf-botto · 2024-08-13T20:52:21Z

ping

mskamp · 2024-08-17T12:32:44Z

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

+  // Attempts to swap variables such that A always equals C
+  if (A != C && A != D)
+    std::swap(A, B);
+  if (A == C || A == D) {


Question: I'm just curious but does this work for constant As/Cs or splat vectors? For example,

define i8 @f(i8 %x, i8 %y) { %add1 = add nuw i8 %x, 42 %add2 = add nuw i8 %y, 42 %umin = call i8 @llvm.umin.i8(i8 %add1, i8 %add2) ret i8 %umin }

and

define <4 x i8> @src(<4 x i8> %x, <4 x i8> %y) { %add1 = add nuw <4 x i8> %x, <i8 42, i8 42, i8 42, i8 42> %add2 = add nuw <4 x i8> %y, <i8 42, i8 42, i8 42, i8 42> %umin = call <4 x i8> @llvm.umin.v4i8(<4 x i8> %add1, <4 x i8> %add2) ret <4 x i8> %umin }

It might be a good idea to add such a test to the precommitted tests.

It would work for constants/splats because the optimisation itself doesn't distinguish between different types of operands. Sure. Will add a test.

mskamp · 2024-08-17T12:52:03Z

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

+  case Intrinsic::umax:
+    return hasNUW && LOp == Instruction::Add;
+  case Intrinsic::umin:
+    return hasNUW && LOp == Instruction::Add;


I think we can combine these cases since there are no functions that distribute over umax but not umin (or vice versa).

case Intrinsic::umax: case Intrinsic::umin: return hasNUW && LOp == Instruction::Add;

Proof sketch: Let f be an arbitrary binary function and x, y, z be arbitrary bit vectors. Suppose that (f u (umax v w)) = (umax (f u v) (f u w)) for all u, v, w. Observe that (umin u v) = (xor u v (umax u v)) for all u, v. Then (umin (f x y) (f x z)) = (xor (f x y) (f x z) (umax (f x y) (f x z))) = (xor (f x y) (f x z) (f x (umax y z))). The case (f x y) = (f x z) is trivial, hence suppose they are not equal. Then (f x (umax y z)) is equal to either (f x y) or (f x z), leaving the other as the result of the xor, which equals (f x (umin y z)), as required.

(Similar for smin/smax.)

Completely understand. Fixed.

mskamp · 2024-08-17T12:54:11Z

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

@@ -1505,6 +1505,80 @@ foldMinimumOverTrailingOrLeadingZeroCount(Value *I0, Value *I1,
      ConstantInt::getTrue(ZeroUndef->getType()));
 }

+/// Return whether "X LOp (Y ROp Z)" is always equal to
+/// "(X LOp Y) ROp (X LOp Z)".
+static bool foldIntrinsicUsingDistributiveLaws(Instruction::BinaryOps LOp,


In my opinion, the name of the function is a bit misleading since it doesn't fold anything but rather checks whether we can apply the transformation.

Thanks. 100% agree. Fixed.

github-actions · 2024-08-19T21:23:11Z

✅ With the latest revision this PR passed the C/C++ code formatter.

goldsteinn

LGTM

nikic

LGTM with a test nit.

nikic · 2024-08-20T10:37:49Z

llvm/test/Transforms/InstCombine/intrinsic-distributive.ll

+; CHECK-NEXT:    [[MAX:%.*]] = add nuw i8 [[TMP1]], [[A]]
+; CHECK-NEXT:    ret i8 [[MAX]]
+;
+  %add1 = add nuw i8 %b, %a ; thwart complexity-based canonicalization


These "thwart" comments don't make sense to me -- the add here is part of the folded pattern.

I've removed them.

jf-botto · 2024-08-20T20:08:26Z

LGTM with a test nit.

Thank you @nikic. Much appreciated it. Would you mind merging it as I don't have write access?

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

dtcxzyw

LGTM. Thank you!

Co-authored-by: Yingwei Zheng <[email protected]>

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

github-actions · 2024-11-02T16:08:31Z

@jf-botto Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

llvm-ci · 2024-11-02T17:22:36Z

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-fast running on sanitizer-buildbot4 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/169/builds/4907

Here is the relevant piece of the build log for the reference

Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 86840 of 86841 tests, 88 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.
TIMEOUT: Clangd Unit Tests :: ./ClangdTests/25/158 (8635 of 86840)
******************** TEST 'Clangd Unit Tests :: ./ClangdTests/25/158' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/tools/extra/clangd/unittests/./ClangdTests-Clangd Unit Tests-2537028-25-158.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=158 GTEST_SHARD_INDEX=25 /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/tools/extra/clangd/unittests/./ClangdTests
--

Note: This is test shard 26 of 158.
[==========] Running 8 tests from 8 test suites.
[----------] Global test environment set-up.
[----------] 1 test from BackgroundQueueTest
[ RUN      ] BackgroundQueueTest.Boost
[       OK ] BackgroundQueueTest.Boost (0 ms)
[----------] 1 test from BackgroundQueueTest (0 ms total)

[----------] 1 test from CompletionTest
[ RUN      ] CompletionTest.NamespaceDoubleInsertion
Built preamble of size 215784 for file /clangd-test/foo.cpp version null in 9.79 seconds
Ignored diagnostic. /clangd-test/foo.cpp:5:5:expected unqualified-id
Sema said no scope specifier, but we saw ns:: in the source code
Code complete: fuzzyFind({
  "AnyScope": false,
  "Limit": null,
  "PreferredTypes": [],
  "ProximityPaths": [
    "/clangd-test/foo.cpp"
  ],
  "Query": "ABC",
  "RestrictForCodeCompletion": true,
  "Scopes": [
    "foo::ns::",
    "ns::"
  ]
})
Code complete: sema context TopLevel, query scopes [foo::ns::,ns::] (AnyScope=false), expected type <none>
Code complete: 0 results from Sema, 1 from Index, 0 matched, 0 from identifiers, 1 returned.
[       OK ] CompletionTest.NamespaceDoubleInsertion (11670 ms)
[----------] 1 test from CompletionTest (11670 ms total)

[----------] 1 test from DexTest
Step 10 (stage2/asan_ubsan check) failure: stage2/asan_ubsan check (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 86840 of 86841 tests, 88 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.
TIMEOUT: Clangd Unit Tests :: ./ClangdTests/25/158 (8635 of 86840)
******************** TEST 'Clangd Unit Tests :: ./ClangdTests/25/158' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/tools/extra/clangd/unittests/./ClangdTests-Clangd Unit Tests-2537028-25-158.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=158 GTEST_SHARD_INDEX=25 /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/tools/extra/clangd/unittests/./ClangdTests
--

Note: This is test shard 26 of 158.
[==========] Running 8 tests from 8 test suites.
[----------] Global test environment set-up.
[----------] 1 test from BackgroundQueueTest
[ RUN      ] BackgroundQueueTest.Boost
[       OK ] BackgroundQueueTest.Boost (0 ms)
[----------] 1 test from BackgroundQueueTest (0 ms total)

[----------] 1 test from CompletionTest
[ RUN      ] CompletionTest.NamespaceDoubleInsertion
Built preamble of size 215784 for file /clangd-test/foo.cpp version null in 9.79 seconds
Ignored diagnostic. /clangd-test/foo.cpp:5:5:expected unqualified-id
Sema said no scope specifier, but we saw ns:: in the source code
Code complete: fuzzyFind({
  "AnyScope": false,
  "Limit": null,
  "PreferredTypes": [],
  "ProximityPaths": [
    "/clangd-test/foo.cpp"
  ],
  "Query": "ABC",
  "RestrictForCodeCompletion": true,
  "Scopes": [
    "foo::ns::",
    "ns::"
  ]
})
Code complete: sema context TopLevel, query scopes [foo::ns::,ns::] (AnyScope=false), expected type <none>
Code complete: 0 results from Sema, 1 from Index, 0 matched, 0 from identifiers, 1 returned.
[       OK ] CompletionTest.NamespaceDoubleInsertion (11670 ms)
[----------] 1 test from CompletionTest (11670 ms total)

[----------] 1 test from DexTest

…1717) This PR fixes part of llvm#92433. It specifically adds the 4 cases mentioned in llvm#92433 (comment). I've added 8 positive tests, 4 of which are mentioned in the comment above and 4 which are their commutative equivalents. Alive proof: https://alive2.llvm.org/ce/z/z6eFTb I've also added 8 negative tests, because we want to make sure we do not optimise if the relevant flags are not relevant because the optimisation wouldn't be sound. Alive proof that the optimisation is invalid: https://alive2.llvm.org/ce/z/NvNjTD I did have to make the integer types `i4` to make Alive not timeout and to fit them all on one page.

jf-botto requested a review from nikic as a code owner August 2, 2024 17:24

llvmbot added llvm:ir llvm:transforms labels Aug 2, 2024

jf-botto force-pushed the 92433-1 branch from 1bc5639 to 6e53489 Compare August 2, 2024 17:26

jf-botto force-pushed the 92433-1 branch from 6e53489 to dc246f9 Compare August 2, 2024 17:49

goldsteinn reviewed Aug 2, 2024

View reviewed changes

jf-botto force-pushed the 92433-1 branch from dc246f9 to aa7b71e Compare August 2, 2024 21:54

dtcxzyw requested changes Aug 3, 2024

View reviewed changes

jf-botto force-pushed the 92433-1 branch 2 times, most recently from 4f9908a to 656f78f Compare August 3, 2024 16:11

goldsteinn reviewed Aug 3, 2024

View reviewed changes

dtcxzyw reviewed Aug 3, 2024

View reviewed changes

jf-botto force-pushed the 92433-1 branch from 656f78f to b1543ac Compare August 3, 2024 19:32

jf-botto requested review from dtcxzyw and goldsteinn August 7, 2024 19:45

mskamp reviewed Aug 17, 2024

View reviewed changes

jf-botto force-pushed the 92433-1 branch from b1543ac to 89daa40 Compare August 19, 2024 21:19

jf-botto force-pushed the 92433-1 branch from 89daa40 to dcd2e06 Compare August 19, 2024 21:24

goldsteinn approved these changes Aug 19, 2024

View reviewed changes

nikic approved these changes Aug 20, 2024

View reviewed changes

jf-botto added 2 commits August 20, 2024 20:58

Precommit test

80a316f

Adding missed optimisation

ced8dba

jf-botto force-pushed the 92433-1 branch from dcd2e06 to ced8dba Compare August 20, 2024 19:59

dtcxzyw mentioned this pull request Aug 21, 2024

Fuzz PR101717 dtcxzyw/llvm-fuzz-service#7

Closed

dtcxzyw reviewed Aug 21, 2024

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp Outdated Show resolved Hide resolved

dtcxzyw approved these changes Aug 21, 2024

View reviewed changes

jf-botto and others added 2 commits August 21, 2024 14:21

Update llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

6631dda

Co-authored-by: Yingwei Zheng <[email protected]>

Merge branch 'main' into 92433-1

22ad486

nikic reviewed Nov 2, 2024

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp Outdated Show resolved Hide resolved

Update llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

98235c2

nikic merged commit fcd51de into llvm:main Nov 2, 2024
8 checks passed

[InstCombine] Factorise Add and Min/Max using Distributivity #101717

[InstCombine] Factorise Add and Min/Max using Distributivity #101717

Uh oh!

Conversation

jf-botto commented Aug 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 2, 2024

Uh oh!

llvmbot commented Aug 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jf-botto commented Aug 2, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jf-botto commented Aug 13, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

goldsteinn left a comment

Choose a reason for hiding this comment

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jf-botto commented Aug 20, 2024

Uh oh!

Uh oh!

dtcxzyw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Nov 2, 2024

Uh oh!

llvm-ci commented Nov 2, 2024

Uh oh!

Uh oh!

jf-botto commented Aug 2, 2024 •

edited

Loading

llvmbot commented Aug 2, 2024 •

edited

Loading

github-actions bot commented Aug 19, 2024 •

edited

Loading