-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[InstCombine] Improve select simplification based on known bits #97289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Nikita Popov (nikic) Changes#95923 added support to simplify select arms based on known bits implied by the select condition. That PR was limited to the case where the arm folds to a constant. This PR extends support to the case where the select arm simplifies in some other way, for example by removing a bitwise operation. This is done by calling SimplifyDemandedBits() with the adjusted SimplifyQuery. Unfortunately, this case requires that the condition isn't undef. Fixes #71533. Full diff: https://github.com/llvm/llvm-project/pull/97289.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
index 979495dae853e..fbbe2251b97e6 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
@@ -4052,22 +4052,36 @@ Instruction *InstCombinerImpl::visitSelectInst(SelectInst &SI) {
});
SimplifyQuery Q = SQ.getWithInstruction(&SI).getWithCondContext(CC);
if (!CC.AffectedValues.empty()) {
- if (!isa<Constant>(TrueVal) &&
- hasAffectedValue(TrueVal, CC.AffectedValues, /*Depth=*/0)) {
- KnownBits Known = llvm::computeKnownBits(TrueVal, /*Depth=*/0, Q);
- if (Known.isConstant())
- return replaceOperand(SI, 1,
- ConstantInt::get(SelType, Known.getConstant()));
- }
+ std::optional<bool> NotUndef;
+ auto SimplifyOp = [&](unsigned OpNum) -> Instruction * {
+ Value *V = SI.getOperand(OpNum);
+ if (isa<Constant>(V) ||
+ !hasAffectedValue(V, CC.AffectedValues, /*Depth=*/0))
+ return nullptr;
+
+ if (!NotUndef)
+ NotUndef = isGuaranteedNotToBeUndef(CondVal);
+ if (*NotUndef) {
+ unsigned BitWidth = SelType->getScalarSizeInBits();
+ KnownBits Known(BitWidth);
+ if (SimplifyDemandedBits(&SI, OpNum, APInt::getAllOnes(BitWidth),
+ Known, /*Depth=*/0, Q))
+ return &SI;
+ } else {
+ KnownBits Known = llvm::computeKnownBits(V, /*Depth=*/0, Q);
+ if (Known.isConstant())
+ return replaceOperand(
+ SI, OpNum, ConstantInt::get(SelType, Known.getConstant()));
+ }
+ return nullptr;
+ };
+
+ if (Instruction *Res = SimplifyOp(1))
+ return Res;
CC.Invert = true;
- if (!isa<Constant>(FalseVal) &&
- hasAffectedValue(FalseVal, CC.AffectedValues, /*Depth=*/0)) {
- KnownBits Known = llvm::computeKnownBits(FalseVal, /*Depth=*/0, Q);
- if (Known.isConstant())
- return replaceOperand(SI, 2,
- ConstantInt::get(SelType, Known.getConstant()));
- }
+ if (Instruction *Res = SimplifyOp(2))
+ return Res;
}
}
diff --git a/llvm/test/Transforms/InstCombine/select.ll b/llvm/test/Transforms/InstCombine/select.ll
index 63a4f74cbaaea..29805e07d0ac9 100644
--- a/llvm/test/Transforms/InstCombine/select.ll
+++ b/llvm/test/Transforms/InstCombine/select.ll
@@ -2989,9 +2989,8 @@ define i8 @select_replacement_loop3(i32 noundef %x) {
define i16 @select_replacement_loop4(i16 noundef %p_12) {
; CHECK-LABEL: @select_replacement_loop4(
-; CHECK-NEXT: [[AND1:%.*]] = and i16 [[P_12:%.*]], 1
-; CHECK-NEXT: [[CMP21:%.*]] = icmp ult i16 [[P_12]], 2
-; CHECK-NEXT: [[AND3:%.*]] = select i1 [[CMP21]], i16 [[AND1]], i16 0
+; CHECK-NEXT: [[CMP1:%.*]] = icmp ult i16 [[P_12:%.*]], 2
+; CHECK-NEXT: [[AND3:%.*]] = select i1 [[CMP1]], i16 [[P_12]], i16 0
; CHECK-NEXT: ret i16 [[AND3]]
;
%cmp1 = icmp ult i16 %p_12, 2
@@ -4671,8 +4670,7 @@ define i8 @select_knownbits_simplify(i8 noundef %x) {
; CHECK-LABEL: @select_knownbits_simplify(
; CHECK-NEXT: [[X_LO:%.*]] = and i8 [[X:%.*]], 1
; CHECK-NEXT: [[CMP:%.*]] = icmp eq i8 [[X_LO]], 0
-; CHECK-NEXT: [[AND:%.*]] = and i8 [[X]], -2
-; CHECK-NEXT: [[RES:%.*]] = select i1 [[CMP]], i8 [[AND]], i8 0
+; CHECK-NEXT: [[RES:%.*]] = select i1 [[CMP]], i8 [[X]], i8 0
; CHECK-NEXT: ret i8 [[RES]]
;
%x.lo = and i8 %x, 1
@@ -4686,8 +4684,7 @@ define i8 @select_knownbits_simplify_nested(i8 noundef %x) {
; CHECK-LABEL: @select_knownbits_simplify_nested(
; CHECK-NEXT: [[X_LO:%.*]] = and i8 [[X:%.*]], 1
; CHECK-NEXT: [[CMP:%.*]] = icmp eq i8 [[X_LO]], 0
-; CHECK-NEXT: [[AND:%.*]] = and i8 [[X]], -2
-; CHECK-NEXT: [[MUL:%.*]] = mul i8 [[AND]], [[AND]]
+; CHECK-NEXT: [[MUL:%.*]] = mul i8 [[X]], [[X]]
; CHECK-NEXT: [[RES:%.*]] = select i1 [[CMP]], i8 [[MUL]], i8 0
; CHECK-NEXT: ret i8 [[RES]]
;
|
When calling SimplifyDemandedBits (as opposed to SimplifyDemandedInstructionBits), and there are multiple uses, always use SimplifyMultipleUseDemandedBits and drop the special case for root values. This fixes the ephemeral value detection, as seen by the restored assumes in tests. It may result in more or less simplification, depending on whether we get more out of having demanded bits or the ability to perform non-multi-use transforms. The change in the phi-known-bits.ll test is because the icmp operand now gets simplified based on demanded bits, which then prevents a different known bits simplification later. This also makes the code safe against future changes like #97289, which add more context that would have to be discarded for the multi-use case.
When calling SimplifyDemandedBits (as opposed to SimplifyDemandedInstructionBits), and there are multiple uses, always use SimplifyMultipleUseDemandedBits and drop the special case for root values. This fixes the ephemeral value detection, as seen by the restored assumes in tests. It may result in more or less simplification, depending on whether we get more out of having demanded bits or the ability to perform non-multi-use transforms. The change in the phi-known-bits.ll test is because the icmp operand now gets simplified based on demanded bits, which then prevents a different known bits simplification later. This also makes the code safe against future changes like llvm#97289, which add more context that would have to be discarded for the multi-use case.
When calling SimplifyDemandedBits (as opposed to SimplifyDemandedInstructionBits), and there are multiple uses, always use SimplifyMultipleUseDemandedBits and drop the special case for root values. This fixes the ephemeral value detection, as seen by the restored assumes in tests. It may result in more or less simplification, depending on whether we get more out of having demanded bits or the ability to perform non-multi-use transforms. The change in the phi-known-bits.ll test is because the icmp operand now gets simplified based on demanded bits, which then prevents a different known bits simplification later. This also makes the code safe against future changes like llvm#97289, which add more context that would have to be discarded for the multi-use case.
This patch breaks a ton of |
return nullptr; | ||
|
||
if (!NotUndef) | ||
NotUndef = isGuaranteedNotToBeUndef(CondVal); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm? Aren't we going to end up skipping the noundef
check on false arm if true arm makes it this far?
Can we revert #66740 after landing this patch? @cyyself found that #66740 caused a significant compile-time regression in SLPVectorizer when building RTL simulator for https://github.com/OpenXiangShan/XiangShan. |
I don't really get how the optimization in this PR would help the motivating issues for #66740? |
Sorry, I misread the fold in #66740 as |
Yeah. And note that I have a minimal reproduction here: https://github.com/cyyself/verilator-slow-compile-cases/tree/master/XiangShan-136f64975e-0a9b31bb30 You may need some headers in Verilator to get this being compiled. For most distro, you can directly install a verilator. |
Can you file a separate ticket for this? |
Extend decomposeBitTestICmp() to handle cases where the resulting comparison is of the form `icmp (X & Mask) pred Cmp` with non-zero `Cmp`. Add a flag to allow code to opt-in to this behavior and use it in the "log op of icmp" fold infrastructure. This addresses regressions from llvm#97289. Proofs: https://alive2.llvm.org/ce/z/hUhdbU
Extend decomposeBitTestICmp() to handle cases where the resulting comparison is of the form `icmp (X & Mask) pred Cmp` with non-zero `Cmp`. Add a flag to allow code to opt-in to this behavior and use it in the "log op of icmp" fold infrastructure. This addresses regressions from llvm#97289. Proofs: https://alive2.llvm.org/ce/z/hUhdbU
Extend decomposeBitTestICmp() to handle cases where the resulting comparison is of the form `icmp (X & Mask) pred C` with non-zero `C`. Add a flag to allow code to opt-in to this behavior and use it in the "log op of icmp" fold infrastructure. This addresses regressions from #97289. Proofs: https://alive2.llvm.org/ce/z/hUhdbU
7aaf72d
to
f11c7ce
Compare
Unfortunately #110836 did not fix all regressions here. For example this one still exists: https://alive2.llvm.org/ce/z/vQxX3c |
Unoptimized IR for one of the problematic cases: https://llvm.godbolt.org/z/evx3zvMEc Originally, the select true value was folded to a xor with sign mask. When we thread the icmp eq over the select it becomes a simple icmp. Instead demanded bits simplification now replaces the xor with an and and then shrinks it. Let me try disabling that particular fold and see if it is improves things... |
#95923 added support to simplify select arms based on known bits implied by the select condition. That PR was limited to the case where the arm folds to a constant.
This PR extends support to the case where the select arm simplifies in some other way, for example by removing a bitwise operation. This is done by calling SimplifyDemandedBits() with the adjusted SimplifyQuery.
Unfortunately, this case requires that the condition isn't undef.
Compile-time: http://llvm-compile-time-tracker.com/compare.php?from=11484cb817bcc2a6e2ef9572be982a1a5a4964ec&to=67da0f88b52c613a277f93cb8d0183939786a9c7&stat=instructions%3Au
Fixes #71533.