-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[TTI][RISCV] Unconditionally break critical edges to sink ADDI #108889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This looks like a rather weird change, so let me explain why this isn't as unreasonable as it looks. Let's start with the problem it's solving. define signext i32 @overlap_live_ranges(ptr %arg, i32 signext %arg1) { bb: %i = icmp eq i32 %arg1, 1 br i1 %i, label %bb2, label %bb5 bb2: ; preds = %bb %i3 = getelementptr inbounds nuw i8, ptr %arg, i64 4 %i4 = load i32, ptr %i3, align 4 br label %bb5 bb5: ; preds = %bb2, %bb %i6 = phi i32 [ %i4, %bb2 ], [ 13, %bb ] ret i32 %i6 } Right now, we codegen this as: li a3, 1 li a2, 13 bne a1, a3, .LBB0_2 lw a2, 4(a0) .LBB0_2: mv a0, a2 ret In this example, we have two values which must be assigned to a0 per the ABI (%arg, and the return value). SelectionDAG ensures that all values used in a successor phi are defined before exit the predecessor block. This creates an ADDI to materialize the immediate in the entry block. Currently, this ADDI is not sunk into the tail block because we'd have to split a critical edges to do so. Note that if our immediate was anything large enough to require two instructions we *would* split this critical edge. Looking at other targets, we notice that they don't seem to have this problem. They perform the sinking, and tail duplication that we don't. Why? Well, it turns out for AArch64 that this is entirely an accident of the existance of the gpr32all register class. The immediate is materialized into the gpr32 class, and then copied into the gpr32all register class. The existance of that copy puts us right back into the two instruction case noted above. This change essentially just cuts around this accident, and implements the same "always sink immediates" behavior for RISCV as well.
; RV64-NEXT: .LBB3_2: | ||
; RV64-NEXT: li a0, 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These select lowering changes are arguably real regressions. We are loosing the opportunity to allocate the untaken result into a register and then conditionally overwrite it. The tail duplication probably wouldn't happen in real code, so we'd end up emitting an extra unconditional jump to bypass the critical edge we split before the join.
Specifically for these selects in tests, most can be done via arithmetic expansions instead. I started to implement that, but ended up pulling on a few too many intertwined changes. I do plan to go back to this, but I don't consider the select impact blocking here.
@llvm/pr-subscribers-llvm-globalisel @llvm/pr-subscribers-backend-risc-v Author: Philip Reames (preames) ChangesThis looks like a rather weird change, so let me explain why this isn't as unreasonable as it looks. Let's start with the problem it's solving.
Right now, we codegen this as:
In this example, we have two values which must be assigned to a0 per the ABI (%arg, and the return value). SelectionDAG ensures that all values used in a successor phi are defined before exit the predecessor block. This creates an ADDI to materialize the immediate in the entry block. Currently, this ADDI is not sunk into the tail block because we'd have to split a critical edges to do so. Note that if our immediate was anything large enough to require two instructions we would split this critical edge. Looking at other targets, we notice that they don't seem to have this problem. They perform the sinking, and tail duplication that we don't. Why? Well, it turns out for AArch64 that this is entirely an accident of the existance of the gpr32all register class. The immediate is materialized into the gpr32 class, and then copied into the gpr32all register class. The existance of that copy puts us right back into the two instruction case noted above. This change essentially just bypasses this emergent behavior aspect of the aarch64 behavior, and implements the same "always sink immediates" behavior for RISCV as well. Patch is 23.40 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/108889.diff 10 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/TargetInstrInfo.h b/llvm/include/llvm/CodeGen/TargetInstrInfo.h
index a3bfc63f2a4790..aef2c1748a099d 100644
--- a/llvm/include/llvm/CodeGen/TargetInstrInfo.h
+++ b/llvm/include/llvm/CodeGen/TargetInstrInfo.h
@@ -159,6 +159,12 @@ class TargetInstrInfo : public MCInstrInfo {
return true;
}
+ /// For a "cheap" instruction which doesn't enable additional sinking,
+ /// should MachineSink break a critical edge to sink it anyways?
+ virtual bool shouldBreakCriticalEdgeToSink(MachineInstr &MI) const {
+ return false;
+ }
+
protected:
/// For instructions with opcodes for which the M_REMATERIALIZABLE flag is
/// set, this hook lets the target specify whether the instruction is actually
diff --git a/llvm/lib/CodeGen/MachineSink.cpp b/llvm/lib/CodeGen/MachineSink.cpp
index 609f9af9767f5d..d34ee4f1040ffc 100644
--- a/llvm/lib/CodeGen/MachineSink.cpp
+++ b/llvm/lib/CodeGen/MachineSink.cpp
@@ -952,7 +952,9 @@ bool MachineSinking::isWorthBreakingCriticalEdge(
}
}
- return false;
+ // Let the target decide if it's worth breaking this
+ // critical edge for a "cheap" instruction.
+ return TII->shouldBreakCriticalEdgeToSink(MI);
}
bool MachineSinking::isLegalToBreakCriticalEdge(MachineInstr &MI,
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.h b/llvm/lib/Target/RISCV/RISCVInstrInfo.h
index 457db9b9860d00..4fb7e68e23d422 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.h
@@ -78,6 +78,10 @@ class RISCVInstrInfo : public RISCVGenInstrInfo {
bool isReallyTriviallyReMaterializable(const MachineInstr &MI) const override;
+ bool shouldBreakCriticalEdgeToSink(MachineInstr &MI) const override {
+ return MI.getOpcode() == RISCV::ADDI;
+ }
+
void copyPhysRegVector(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI, const DebugLoc &DL,
MCRegister DstReg, MCRegister SrcReg, bool KillSrc,
diff --git a/llvm/test/CodeGen/RISCV/aext-to-sext.ll b/llvm/test/CodeGen/RISCV/aext-to-sext.ll
index 888ea666d71316..f3f71a923bdc29 100644
--- a/llvm/test/CodeGen/RISCV/aext-to-sext.ll
+++ b/llvm/test/CodeGen/RISCV/aext-to-sext.ll
@@ -78,12 +78,14 @@ bar:
define i64 @sext_phi_constants(i32 signext %c) {
; RV64I-LABEL: sext_phi_constants:
; RV64I: # %bb.0:
-; RV64I-NEXT: li a1, -1
-; RV64I-NEXT: bnez a0, .LBB2_2
-; RV64I-NEXT: # %bb.1: # %iffalse
-; RV64I-NEXT: li a1, -2
-; RV64I-NEXT: .LBB2_2: # %merge
-; RV64I-NEXT: slli a0, a1, 32
+; RV64I-NEXT: beqz a0, .LBB2_2
+; RV64I-NEXT: # %bb.1:
+; RV64I-NEXT: li a0, -1
+; RV64I-NEXT: j .LBB2_3
+; RV64I-NEXT: .LBB2_2: # %iffalse
+; RV64I-NEXT: li a0, -2
+; RV64I-NEXT: .LBB2_3: # %merge
+; RV64I-NEXT: slli a0, a0, 32
; RV64I-NEXT: srli a0, a0, 32
; RV64I-NEXT: ret
%a = icmp ne i32 %c, 0
diff --git a/llvm/test/CodeGen/RISCV/machine-sink-load-immediate.ll b/llvm/test/CodeGen/RISCV/machine-sink-load-immediate.ll
index 775ea8e820afe3..f16bc0507cb480 100644
--- a/llvm/test/CodeGen/RISCV/machine-sink-load-immediate.ll
+++ b/llvm/test/CodeGen/RISCV/machine-sink-load-immediate.ll
@@ -184,13 +184,13 @@ declare i32 @toupper()
define signext i32 @overlap_live_ranges(ptr %arg, i32 signext %arg1) {
; CHECK-LABEL: overlap_live_ranges:
; CHECK: # %bb.0: # %bb
-; CHECK-NEXT: li a3, 1
-; CHECK-NEXT: li a2, 13
-; CHECK-NEXT: bne a1, a3, .LBB1_2
+; CHECK-NEXT: li a2, 1
+; CHECK-NEXT: bne a1, a2, .LBB1_2
; CHECK-NEXT: # %bb.1: # %bb2
-; CHECK-NEXT: lw a2, 4(a0)
-; CHECK-NEXT: .LBB1_2: # %bb5
-; CHECK-NEXT: mv a0, a2
+; CHECK-NEXT: lw a0, 4(a0)
+; CHECK-NEXT: ret
+; CHECK-NEXT: .LBB1_2:
+; CHECK-NEXT: li a0, 13
; CHECK-NEXT: ret
bb:
%i = icmp eq i32 %arg1, 1
diff --git a/llvm/test/CodeGen/RISCV/rv64m-w-insts-legalization.ll b/llvm/test/CodeGen/RISCV/rv64m-w-insts-legalization.ll
index f69909e76d4c10..a2c572e07ff7d0 100644
--- a/llvm/test/CodeGen/RISCV/rv64m-w-insts-legalization.ll
+++ b/llvm/test/CodeGen/RISCV/rv64m-w-insts-legalization.ll
@@ -5,15 +5,13 @@ define signext i32 @mulw(i32 signext %s, i32 signext %n, i32 signext %k) nounwin
; CHECK-LABEL: mulw:
; CHECK: # %bb.0: # %entry
; CHECK-NEXT: li a2, 1
-; CHECK-NEXT: bge a0, a1, .LBB0_3
-; CHECK-NEXT: # %bb.1: # %for.body.preheader
-; CHECK-NEXT: li a2, 1
-; CHECK-NEXT: .LBB0_2: # %for.body
+; CHECK-NEXT: bge a0, a1, .LBB0_2
+; CHECK-NEXT: .LBB0_1: # %for.body
; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
; CHECK-NEXT: mulw a2, a0, a2
; CHECK-NEXT: addiw a0, a0, 1
-; CHECK-NEXT: blt a0, a1, .LBB0_2
-; CHECK-NEXT: .LBB0_3: # %for.cond.cleanup
+; CHECK-NEXT: blt a0, a1, .LBB0_1
+; CHECK-NEXT: .LBB0_2: # %for.cond.cleanup
; CHECK-NEXT: mv a0, a2
; CHECK-NEXT: ret
entry:
diff --git a/llvm/test/CodeGen/RISCV/select-const.ll b/llvm/test/CodeGen/RISCV/select-const.ll
index 792df6236ddc0e..b3e32b1f5c9c3d 100644
--- a/llvm/test/CodeGen/RISCV/select-const.ll
+++ b/llvm/test/CodeGen/RISCV/select-const.ll
@@ -61,22 +61,22 @@ define signext i32 @select_const_int_pow2_zero(i1 zeroext %a) nounwind {
define signext i32 @select_const_int_harder(i1 zeroext %a) nounwind {
; RV32-LABEL: select_const_int_harder:
; RV32: # %bb.0:
-; RV32-NEXT: mv a1, a0
-; RV32-NEXT: li a0, 6
-; RV32-NEXT: bnez a1, .LBB3_2
+; RV32-NEXT: bnez a0, .LBB3_2
; RV32-NEXT: # %bb.1:
; RV32-NEXT: li a0, 38
+; RV32-NEXT: ret
; RV32-NEXT: .LBB3_2:
+; RV32-NEXT: li a0, 6
; RV32-NEXT: ret
;
; RV64-LABEL: select_const_int_harder:
; RV64: # %bb.0:
-; RV64-NEXT: mv a1, a0
-; RV64-NEXT: li a0, 6
-; RV64-NEXT: bnez a1, .LBB3_2
+; RV64-NEXT: bnez a0, .LBB3_2
; RV64-NEXT: # %bb.1:
; RV64-NEXT: li a0, 38
+; RV64-NEXT: ret
; RV64-NEXT: .LBB3_2:
+; RV64-NEXT: li a0, 6
; RV64-NEXT: ret
%1 = select i1 %a, i32 6, i32 38
ret i32 %1
diff --git a/llvm/test/CodeGen/RISCV/select.ll b/llvm/test/CodeGen/RISCV/select.ll
index 8aa50cc0f39c1a..e3766c024197ef 100644
--- a/llvm/test/CodeGen/RISCV/select.ll
+++ b/llvm/test/CodeGen/RISCV/select.ll
@@ -1585,22 +1585,22 @@ define i32 @select_cst_not5(i32 signext %a, i32 signext %b) {
define i32 @select_cst_unknown(i32 signext %a, i32 signext %b) {
; RV32IM-LABEL: select_cst_unknown:
; RV32IM: # %bb.0:
-; RV32IM-NEXT: mv a2, a0
-; RV32IM-NEXT: li a0, 5
-; RV32IM-NEXT: blt a2, a1, .LBB42_2
+; RV32IM-NEXT: blt a0, a1, .LBB42_2
; RV32IM-NEXT: # %bb.1:
; RV32IM-NEXT: li a0, -7
+; RV32IM-NEXT: ret
; RV32IM-NEXT: .LBB42_2:
+; RV32IM-NEXT: li a0, 5
; RV32IM-NEXT: ret
;
; RV64IM-LABEL: select_cst_unknown:
; RV64IM: # %bb.0:
-; RV64IM-NEXT: mv a2, a0
-; RV64IM-NEXT: li a0, 5
-; RV64IM-NEXT: blt a2, a1, .LBB42_2
+; RV64IM-NEXT: blt a0, a1, .LBB42_2
; RV64IM-NEXT: # %bb.1:
; RV64IM-NEXT: li a0, -7
+; RV64IM-NEXT: ret
; RV64IM-NEXT: .LBB42_2:
+; RV64IM-NEXT: li a0, 5
; RV64IM-NEXT: ret
;
; RV64IMXVTCONDOPS-LABEL: select_cst_unknown:
@@ -1626,22 +1626,22 @@ define i32 @select_cst_unknown(i32 signext %a, i32 signext %b) {
define i32 @select_cst1(i1 zeroext %cond) {
; RV32IM-LABEL: select_cst1:
; RV32IM: # %bb.0:
-; RV32IM-NEXT: mv a1, a0
-; RV32IM-NEXT: li a0, 10
-; RV32IM-NEXT: bnez a1, .LBB43_2
+; RV32IM-NEXT: bnez a0, .LBB43_2
; RV32IM-NEXT: # %bb.1:
; RV32IM-NEXT: li a0, 20
+; RV32IM-NEXT: ret
; RV32IM-NEXT: .LBB43_2:
+; RV32IM-NEXT: li a0, 10
; RV32IM-NEXT: ret
;
; RV64IM-LABEL: select_cst1:
; RV64IM: # %bb.0:
-; RV64IM-NEXT: mv a1, a0
-; RV64IM-NEXT: li a0, 10
-; RV64IM-NEXT: bnez a1, .LBB43_2
+; RV64IM-NEXT: bnez a0, .LBB43_2
; RV64IM-NEXT: # %bb.1:
; RV64IM-NEXT: li a0, 20
+; RV64IM-NEXT: ret
; RV64IM-NEXT: .LBB43_2:
+; RV64IM-NEXT: li a0, 10
; RV64IM-NEXT: ret
;
; RV64IMXVTCONDOPS-LABEL: select_cst1:
@@ -1664,24 +1664,24 @@ define i32 @select_cst1(i1 zeroext %cond) {
define i32 @select_cst2(i1 zeroext %cond) {
; RV32IM-LABEL: select_cst2:
; RV32IM: # %bb.0:
-; RV32IM-NEXT: mv a1, a0
-; RV32IM-NEXT: li a0, 10
-; RV32IM-NEXT: bnez a1, .LBB44_2
+; RV32IM-NEXT: bnez a0, .LBB44_2
; RV32IM-NEXT: # %bb.1:
; RV32IM-NEXT: lui a0, 5
; RV32IM-NEXT: addi a0, a0, -480
+; RV32IM-NEXT: ret
; RV32IM-NEXT: .LBB44_2:
+; RV32IM-NEXT: li a0, 10
; RV32IM-NEXT: ret
;
; RV64IM-LABEL: select_cst2:
; RV64IM: # %bb.0:
-; RV64IM-NEXT: mv a1, a0
-; RV64IM-NEXT: li a0, 10
-; RV64IM-NEXT: bnez a1, .LBB44_2
+; RV64IM-NEXT: bnez a0, .LBB44_2
; RV64IM-NEXT: # %bb.1:
; RV64IM-NEXT: lui a0, 5
; RV64IM-NEXT: addiw a0, a0, -480
+; RV64IM-NEXT: ret
; RV64IM-NEXT: .LBB44_2:
+; RV64IM-NEXT: li a0, 10
; RV64IM-NEXT: ret
;
; RV64IMXVTCONDOPS-LABEL: select_cst2:
@@ -1782,24 +1782,24 @@ define i32 @select_cst4(i1 zeroext %cond) {
define i32 @select_cst5(i1 zeroext %cond) {
; RV32IM-LABEL: select_cst5:
; RV32IM: # %bb.0:
-; RV32IM-NEXT: mv a1, a0
-; RV32IM-NEXT: li a0, 2047
-; RV32IM-NEXT: bnez a1, .LBB47_2
+; RV32IM-NEXT: bnez a0, .LBB47_2
; RV32IM-NEXT: # %bb.1:
; RV32IM-NEXT: lui a0, 1
; RV32IM-NEXT: addi a0, a0, -2047
+; RV32IM-NEXT: ret
; RV32IM-NEXT: .LBB47_2:
+; RV32IM-NEXT: li a0, 2047
; RV32IM-NEXT: ret
;
; RV64IM-LABEL: select_cst5:
; RV64IM: # %bb.0:
-; RV64IM-NEXT: mv a1, a0
-; RV64IM-NEXT: li a0, 2047
-; RV64IM-NEXT: bnez a1, .LBB47_2
+; RV64IM-NEXT: bnez a0, .LBB47_2
; RV64IM-NEXT: # %bb.1:
; RV64IM-NEXT: lui a0, 1
; RV64IM-NEXT: addiw a0, a0, -2047
+; RV64IM-NEXT: ret
; RV64IM-NEXT: .LBB47_2:
+; RV64IM-NEXT: li a0, 2047
; RV64IM-NEXT: ret
;
; RV64IMXVTCONDOPS-LABEL: select_cst5:
diff --git a/llvm/test/CodeGen/RISCV/sextw-removal.ll b/llvm/test/CodeGen/RISCV/sextw-removal.ll
index 8cf78551d28f98..58c3dd18875571 100644
--- a/llvm/test/CodeGen/RISCV/sextw-removal.ll
+++ b/llvm/test/CodeGen/RISCV/sextw-removal.ll
@@ -1032,82 +1032,86 @@ bb7: ; preds = %bb2
define signext i32 @bug(i32 signext %x) {
; CHECK-LABEL: bug:
; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: beqz a0, .LBB18_4
+; CHECK-NEXT: beqz a0, .LBB18_5
; CHECK-NEXT: # %bb.1: # %if.end
-; CHECK-NEXT: srliw a2, a0, 16
-; CHECK-NEXT: seqz a1, a2
-; CHECK-NEXT: slli a1, a1, 4
-; CHECK-NEXT: sllw a1, a0, a1
-; CHECK-NEXT: li a0, 16
-; CHECK-NEXT: beqz a2, .LBB18_3
+; CHECK-NEXT: srliw a1, a0, 16
+; CHECK-NEXT: seqz a2, a1
+; CHECK-NEXT: slli a2, a2, 4
+; CHECK-NEXT: sllw a0, a0, a2
+; CHECK-NEXT: beqz a1, .LBB18_3
; CHECK-NEXT: # %bb.2: # %if.end
-; CHECK-NEXT: li a0, 32
-; CHECK-NEXT: .LBB18_3: # %if.end
-; CHECK-NEXT: srliw a2, a1, 24
+; CHECK-NEXT: li a1, 32
+; CHECK-NEXT: j .LBB18_4
+; CHECK-NEXT: .LBB18_3:
+; CHECK-NEXT: li a1, 16
+; CHECK-NEXT: .LBB18_4: # %if.end
+; CHECK-NEXT: srliw a2, a0, 24
; CHECK-NEXT: seqz a2, a2
; CHECK-NEXT: slli a3, a2, 3
-; CHECK-NEXT: sllw a1, a1, a3
+; CHECK-NEXT: sllw a0, a0, a3
; CHECK-NEXT: negw a2, a2
; CHECK-NEXT: andi a2, a2, -8
-; CHECK-NEXT: add a0, a0, a2
-; CHECK-NEXT: srliw a2, a1, 28
+; CHECK-NEXT: add a1, a1, a2
+; CHECK-NEXT: srliw a2, a0, 28
; CHECK-NEXT: seqz a2, a2
; CHECK-NEXT: slli a3, a2, 2
-; CHECK-NEXT: sllw a1, a1, a3
+; CHECK-NEXT: sllw a0, a0, a3
; CHECK-NEXT: negw a2, a2
; CHECK-NEXT: andi a2, a2, -4
-; CHECK-NEXT: add a0, a0, a2
-; CHECK-NEXT: srliw a2, a1, 30
+; CHECK-NEXT: add a1, a1, a2
+; CHECK-NEXT: srliw a2, a0, 30
; CHECK-NEXT: seqz a2, a2
; CHECK-NEXT: slli a3, a2, 1
-; CHECK-NEXT: sllw a1, a1, a3
+; CHECK-NEXT: sllw a0, a0, a3
; CHECK-NEXT: negw a2, a2
; CHECK-NEXT: andi a2, a2, -2
-; CHECK-NEXT: add a0, a0, a2
-; CHECK-NEXT: not a1, a1
-; CHECK-NEXT: srli a1, a1, 31
-; CHECK-NEXT: addw a0, a0, a1
-; CHECK-NEXT: .LBB18_4: # %cleanup
+; CHECK-NEXT: add a1, a1, a2
+; CHECK-NEXT: not a0, a0
+; CHECK-NEXT: srli a0, a0, 31
+; CHECK-NEXT: addw a0, a1, a0
+; CHECK-NEXT: .LBB18_5: # %cleanup
; CHECK-NEXT: ret
;
; NOREMOVAL-LABEL: bug:
; NOREMOVAL: # %bb.0: # %entry
-; NOREMOVAL-NEXT: beqz a0, .LBB18_4
+; NOREMOVAL-NEXT: beqz a0, .LBB18_5
; NOREMOVAL-NEXT: # %bb.1: # %if.end
-; NOREMOVAL-NEXT: srliw a2, a0, 16
-; NOREMOVAL-NEXT: seqz a1, a2
-; NOREMOVAL-NEXT: slli a1, a1, 4
-; NOREMOVAL-NEXT: sllw a1, a0, a1
-; NOREMOVAL-NEXT: li a0, 16
-; NOREMOVAL-NEXT: beqz a2, .LBB18_3
+; NOREMOVAL-NEXT: srliw a1, a0, 16
+; NOREMOVAL-NEXT: seqz a2, a1
+; NOREMOVAL-NEXT: slli a2, a2, 4
+; NOREMOVAL-NEXT: sllw a0, a0, a2
+; NOREMOVAL-NEXT: beqz a1, .LBB18_3
; NOREMOVAL-NEXT: # %bb.2: # %if.end
-; NOREMOVAL-NEXT: li a0, 32
-; NOREMOVAL-NEXT: .LBB18_3: # %if.end
-; NOREMOVAL-NEXT: srliw a2, a1, 24
+; NOREMOVAL-NEXT: li a1, 32
+; NOREMOVAL-NEXT: j .LBB18_4
+; NOREMOVAL-NEXT: .LBB18_3:
+; NOREMOVAL-NEXT: li a1, 16
+; NOREMOVAL-NEXT: .LBB18_4: # %if.end
+; NOREMOVAL-NEXT: srliw a2, a0, 24
; NOREMOVAL-NEXT: seqz a2, a2
; NOREMOVAL-NEXT: slli a3, a2, 3
-; NOREMOVAL-NEXT: sllw a1, a1, a3
+; NOREMOVAL-NEXT: sllw a0, a0, a3
; NOREMOVAL-NEXT: negw a2, a2
; NOREMOVAL-NEXT: andi a2, a2, -8
-; NOREMOVAL-NEXT: add a0, a0, a2
-; NOREMOVAL-NEXT: srliw a2, a1, 28
+; NOREMOVAL-NEXT: add a1, a1, a2
+; NOREMOVAL-NEXT: srliw a2, a0, 28
; NOREMOVAL-NEXT: seqz a2, a2
; NOREMOVAL-NEXT: slli a3, a2, 2
-; NOREMOVAL-NEXT: sllw a1, a1, a3
+; NOREMOVAL-NEXT: sllw a0, a0, a3
; NOREMOVAL-NEXT: negw a2, a2
; NOREMOVAL-NEXT: andi a2, a2, -4
-; NOREMOVAL-NEXT: add a0, a0, a2
-; NOREMOVAL-NEXT: srliw a2, a1, 30
+; NOREMOVAL-NEXT: add a1, a1, a2
+; NOREMOVAL-NEXT: srliw a2, a0, 30
; NOREMOVAL-NEXT: seqz a2, a2
; NOREMOVAL-NEXT: slli a3, a2, 1
-; NOREMOVAL-NEXT: sllw a1, a1, a3
+; NOREMOVAL-NEXT: sllw a0, a0, a3
; NOREMOVAL-NEXT: negw a2, a2
; NOREMOVAL-NEXT: andi a2, a2, -2
-; NOREMOVAL-NEXT: add a0, a0, a2
-; NOREMOVAL-NEXT: not a1, a1
-; NOREMOVAL-NEXT: srli a1, a1, 31
-; NOREMOVAL-NEXT: addw a0, a0, a1
-; NOREMOVAL-NEXT: .LBB18_4: # %cleanup
+; NOREMOVAL-NEXT: add a1, a1, a2
+; NOREMOVAL-NEXT: not a0, a0
+; NOREMOVAL-NEXT: srli a0, a0, 31
+; NOREMOVAL-NEXT: addw a0, a1, a0
+; NOREMOVAL-NEXT: .LBB18_5: # %cleanup
; NOREMOVAL-NEXT: ret
entry:
%tobool.not = icmp eq i32 %x, 0
diff --git a/llvm/test/CodeGen/RISCV/typepromotion-overflow.ll b/llvm/test/CodeGen/RISCV/typepromotion-overflow.ll
index ec7e0ecce80caa..ae1aabed498059 100644
--- a/llvm/test/CodeGen/RISCV/typepromotion-overflow.ll
+++ b/llvm/test/CodeGen/RISCV/typepromotion-overflow.ll
@@ -7,13 +7,14 @@ define zeroext i16 @overflow_add(i16 zeroext %a, i16 zeroext %b) {
; CHECK-NEXT: add a0, a1, a0
; CHECK-NEXT: ori a0, a0, 1
; CHECK-NEXT: slli a0, a0, 48
-; CHECK-NEXT: srli a1, a0, 48
-; CHECK-NEXT: li a2, 1024
-; CHECK-NEXT: li a0, 2
-; CHECK-NEXT: bltu a2, a1, .LBB0_2
+; CHECK-NEXT: srli a0, a0, 48
+; CHECK-NEXT: li a1, 1024
+; CHECK-NEXT: bltu a1, a0, .LBB0_2
; CHECK-NEXT: # %bb.1:
; CHECK-NEXT: li a0, 5
+; CHECK-NEXT: ret
; CHECK-NEXT: .LBB0_2:
+; CHECK-NEXT: li a0, 2
; CHECK-NEXT: ret
%add = add i16 %b, %a
%or = or i16 %add, 1
@@ -28,13 +29,14 @@ define zeroext i16 @overflow_sub(i16 zeroext %a, i16 zeroext %b) {
; CHECK-NEXT: subw a0, a0, a1
; CHECK-NEXT: ori a0, a0, 1
; CHECK-NEXT: slli a0, a0, 48
-; CHECK-NEXT: srli a1, a0, 48
-; CHECK-NEXT: li a2, 1024
-; CHECK-NEXT: li a0, 2
-; CHECK-NEXT: bltu a2, a1, .LBB1_2
+; CHECK-NEXT: srli a0, a0, 48
+; CHECK-NEXT: li a1, 1024
+; CHECK-NEXT: bltu a1, a0, .LBB1_2
; CHECK-NEXT: # %bb.1:
; CHECK-NEXT: li a0, 5
+; CHECK-NEXT: ret
; CHECK-NEXT: .LBB1_2:
+; CHECK-NEXT: li a0, 2
; CHECK-NEXT: ret
%add = sub i16 %a, %b
%or = or i16 %add, 1
@@ -49,13 +51,14 @@ define zeroext i16 @overflow_mul(i16 zeroext %a, i16 zeroext %b) {
; CHECK-NEXT: mul a0, a1, a0
; CHECK-NEXT: ori a0, a0, 1
; CHECK-NEXT: slli a0, a0, 48
-; CHECK-NEXT: srli a1, a0, 48
-; CHECK-NEXT: li a2, 1024
-; CHECK-NEXT: li a0, 2
-; CHECK-NEXT: bltu a2, a1, .LBB2_2
+; CHECK-NEXT: srli a0, a0, 48
+; CHECK-NEXT: li a1, 1024
+; CHECK-NEXT: bltu a1, a0, .LBB2_2
; CHECK-NEXT: # %bb.1:
; CHECK-NEXT: li a0, 5
+; CHECK-NEXT: ret
; CHECK-NEXT: .LBB2_2:
+; CHECK-NEXT: li a0, 2
; CHECK-NEXT: ret
%add = mul i16 %b, %a
%or = or i16 %add, 1
@@ -70,13 +73,14 @@ define zeroext i16 @overflow_shl(i16 zeroext %a, i16 zeroext %b) {
; CHECK-NEXT: sll a0, a0, a1
; CHECK-NEXT: ori a0, a0, 1
; CHECK-NEXT: slli a0, a0, 48
-; CHECK-NEXT: srli a1, a0, 48
-; CHECK-NEXT: li a2, 1024
-; CHECK-NEXT: li a0, 2
-; CHECK-NEXT: bltu a2, a1, .LBB3_2
+; CHECK-NEXT: srli a0, a0, 48
+; CHECK-NEXT: li a1, 1024
+; CHECK-NEXT: bltu a1, a0, .LBB3_2
; CHECK-NEXT: # %bb.1:
; CHECK-NEXT: li a0, 5
+; CHECK-NEXT: ret
; CHECK-NEXT: .LBB3_2:
+; CHECK-NEXT: li a0, 2
; CHECK-NEXT: ret
%add = shl i16 %a, %b
%or = or i16 %add, 1
@@ -89,12 +93,13 @@ define i32 @overflow_add_no_consts(i8 zeroext %a, i8 zeroext %b, i8 zeroext %lim
; CHECK-LABEL: overflow_add_no_consts:
; CHECK: # %bb.0:
; CHECK-NEXT: add a0, a1, a0
-; CHECK-NEXT: andi a1, a0, 255
-; CHECK-NEXT: li a0, 8
-; CHECK-NEXT: bltu a2, a1, .LBB4_2
+; CHECK-NEXT: andi a0, a0, 255
+; CHECK-NEXT: bltu a2, a0, .LBB4_2
; CHECK-NEXT: # %bb.1:
; CHECK-NEXT: li a0, 16
+; CHECK-NEXT: ret
; CHECK-NEXT: .LBB4_2:
+; CHECK-NEXT: li a0, 8
; CHECK-NEXT: ret
%add = add i8 %b, %a
%cmp = icmp ugt i8 %add, %limit
@@ -106,13 +111,14 @@ define i32 @overflow_add_const_limit(i8 zeroext %a, i8 zeroext %b) {
; CHECK-LABEL: overflow_add_const_limit:
; CHECK: # %bb.0:
; CHECK-NEXT: add a0, a1, a0
-; CHECK-NEXT: andi a1, a0, 255
-; CHECK-NEXT: li a2, 128
-; CHECK-NEXT: li a0, 8
-; CHECK-NEXT: bltu a2, a1, .LBB5_2
+; CHECK-NEXT: andi a0, a0, 255
+; CHECK-NEXT: li a1, 128
+; CHECK-NEXT: bltu a1, a0, .LBB5_2
; CHECK-NEXT: # %bb.1:
; CHECK-NEXT: li a0, 16
+; CHECK-NEXT: ret
; CHECK-NEXT: .LBB5_2:
+; CHECK-NEXT: li a0, 8
; CHECK-NEXT: ret
%add = add i8 %b, %a
%cmp = icmp ugt i8 %add, -128
@@ -124,13 +130,14 @@ define i32 @overflow_add_positive_const_limit(i8 zeroext %a) {
; CHECK-LABEL: overflow_add_positive_const_limit:
; CHECK: # %bb.0:
; CHECK-NEXT: slli a0, a0, 56
-; CHECK-NEXT: srai a1, a0, 56
-; CHECK-NEXT: li a2, -1
-; CHECK-NEXT: li a0, 8
-; CHECK-NEXT: blt a1, a2, .LBB6_2
+; CHECK-NEXT: srai a0, a0, 56
+; CHECK-NEXT: li a1, -1
+; CHECK-NEXT: blt a0, a1, .LBB6_2
; CHECK-NEXT: # %bb.1:
; CHECK-NEXT: li a0, 16
+; CHECK-NEXT: ret
; CHECK-NEXT: .LBB6_2:
+; CHECK-NEXT: li a0, 8
; CHECK-NEXT: ret
%cmp = icmp slt i8 %a, -1
%res = select i1 %cmp, i32 8, i32 16
@@ -140,13 +147,13 @@ define i32 @overflow_add_positive_const_limit(i8 zeroext %a) {
define i32 @unsafe_add_underflow(i8 zeroext %a) {
; CHECK-LABEL: unsafe_add_underflow:
; CHECK: # %bb.0:
-; CHECK-NEXT: mv a1, a0
-; CHECK-NEXT: li a2, 1
-; CHECK-NEXT: li a0, 8
-; CHECK-NEXT: beq a1, a2, .LBB7_2
+; CHECK-NEXT: li a1, 1
+; CHECK-NEXT: beq a0, a1, .LBB7_2
; CHECK-NEXT: # %bb.1:
; CHECK-NEXT: li a0, 16
+; CHECK-NEXT: ret
; CHECK-NEXT: .LBB7_2:
+; CHECK-NEXT: li a0, 8
; CHECK-NEXT: ret
%cmp = icmp eq i8 %a, 1
%res = select i1 %cmp, i32 8, i32 16
@@ -156,12 +163,12 @@ define i32 @unsafe_add_underflow(i8 zeroext %a) {
define i32 @safe_add_underflow(i8 zeroext %a) {
; CHECK-LABEL: safe_add_underflow:
; CHECK: # %bb.0:
-; CHECK-NEXT: mv a1, a0
-; CHECK-NEXT: li a0, 8
-; CHECK-NEXT: beqz a1, .LBB8_2
+; CHECK-NEXT: beqz a0, .LBB8_2
; CHECK-NEXT: # %bb.1:
; CHECK-NEXT: li a0, 16
+; CHECK-NEXT: ret
; CHECK-NEXT: .LBB8_2:
+; CHECK-NEXT: li a0, 8
; CHECK-NEXT: ret
%cmp = icmp eq i8 %a, 0
%res = select i1 %cmp, i32 8, i32 16
@@ -171,13 +178,14 @@ define i32 @safe_add_underflow(i8 zeroext %a) {
define i32 @safe_add_underflow_neg(i8 zeroext %a) {
; CHECK-LABEL: safe_add_underflow_neg:
; CHECK: # %bb.0:
-; CHECK-N...
[truncated]
|
Looks like I screwed up this PR - I'd not updated all the tests. Working on fixing that, will update once done. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
define i32 @f_medium_ledge_pos(i32 %in0) minsize { | ||
%cmp = icmp CMPCOND i32 %in0, 32 | ||
%cmp = icmp CMPCOND i32 %in0, 33 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test delta in this file needs a bit of explanation. Most of the diff is me trying to understand the testing to find the actual problem. The only actual change is this line.
Essentially, the original test is confused about what's it's testing. "32" can be handled via the c.addi offset scheme. However, here's the originally assembly we got:
addi a1, a0, -32
li a0, -99
bnez a1, .LBB0_2
# %bb.1:
li a0, 42
.LBB0_2:
ret
Note that the c.addi form can only be matched when source and destination register are the same.
One this change sinks the unrelated LI out of the way, the register allocator can do that, and we end up with a compressed addi.
From the naming of the test and the comment, this doesn't appear to have been the intended result, and thus I adjusted the constant to 33. If reviewers agree that this is the right test update, I'm going to land a rework of this test file separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is the right test update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I landed these changes on main, and eventually figured out we could autogen this test without loosing the spirit of the compressed checks. The resulting diff (there is a much bigger one than even I'd realized) should be a lot more obvious now. They do look to be improvements.
@@ -91,14 +87,14 @@ define i32 @f_small_edge_neg(i32 %in0) minsize { | |||
; constant is medium and not fit in 6 bit (compress imm), | |||
; but fit in 12 bit (imm) | |||
; RV32IFDC-LABEL: <f_medium_ledge_pos>: | |||
; RV32IFDC: addi [[MAYZEROREG:.*]], [[REG:.*]], -0x20 | |||
; RV32IFDC: RESBROPT [[MAYZEROREG]], [[PLACE:.*]] | |||
; RV32IFDC: addi [[MAYZEROREG:.*]], [[REG:.*]], -0x21 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is REG here zero
like in the RV32IFD case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind. It's not.
Per the comment, this test is intending to test the first constant which can't be encoded via a c.addi. However, -32 *can* be encoded as in a c.addi, and all that's preventing it from doing so is the register allocators choice to use a difference destination register on the add than it's source. (Which compressed doesn't support.) The current LLC codegen for this test looks like: addi a1, a0, -32 li a0, -99 bnez a1, .LBB0_2 li a0, 42 .LBB0_2: ret After #108889, we sink the LI, and the register allocator picks the same source and dest register for the addi resulting in the c.addi form being emitted. So, to avoid a confusing diff let's fix the test to check what was originally intended.
Per the comment, this test is intending to test the first constant which can't be encoded via a c.addi. However, -32 *can* be encoded as in a c.addi, and all that's preventing it from doing so is the register allocators choice to use a difference destination register on the add than it's source. (Which compressed doesn't support.) The current LLC codegen for this test looks like: addi a1, a0, -32 li a0, -99 bnez a1, .LBB0_2 li a0, 42 .LBB0_2: ret After llvm#108889, we sink the LI, and the register allocator picks the same source and dest register for the addi resulting in the c.addi form being emitted. So, to avoid a confusing diff let's fix the test to check what was originally intended.
Looking at the diff for this with the GCC torture suite, I think there's one case where it seems this change makes codegen slightly worse that it would be worth you having a quick look at - 20020413-1.c. Previously, each conditional BB loaded an immediate then branched to the common exit block. After this change, the BBs conditionally branch to the BB that loads the needed immediate, which then branches to the common exit block. This isn't a blocker I don't think, as overall the change is positive. |
I took a look at this, and it's basically the patch working as intended. One interesting quirk is that it turned out to be specific to having a series of branches. With branch dispatch you see a code change with this patch. With switch dispatch, we already sink into the successor blocks (by construction). So, if anything, it's another reason to land this patch - consistency! Can I get an LGTM here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/30/builds/11013 Here is the relevant piece of the build log for the reference
|
This test has been failed after commit [RISCV] Implement tail call optimization in machine outliner (PR llvm#115297). Changes to the same file were merged today earlier [TTI][RISCV] Unconditionally break critical edges to sink ADDI (PR llvm#108889).
@topperc and I are findinging that this change is causing a performance regression in dhrystone and perlbench_r. It leads to splitting the edge and sinking which creates a new basic block and adds an extra branch that takes additional cycles. Would it be possible for us to use better heuristics on when to split the edges? I reduced the
With this reduced example, we used to generate:
Now we generate:
In this reduced case, we end up with an extra basic block (and consequently an extra jump) as a result of this PR. Please note that I found this in Perl_sv_cmp_flags function of perlbench, but when this patch is reverted, it helps out the dynamic IC in S_regmatch much more than it does in Perl_sv_cmp_flags. |
@michaelmaitland - I will take a look at this tomorrow, and get back to you. |
@michaelmaitland Can I ask you to file a bug for the regression you noted? I think this may have been reduced a bit past usefulness. Looking at the after assembly, the "right" answer for this case is for this:
To become a snez. However, when I opened up the precodegen IR for the routine you mentioned, I don't see this pattern in the IR. I'm not claiming that my particular snapshot from some time ago is evidence against there being a regression here. I'm just saying I need to know how to reproduce the original example, not just the reduced case which may or may not be representative. I suspect that e.g. LTO vs non-LTO might be very important here. For the record, I do not plan to revert this without significant investigation first. It's been four months since this landed. |
This looks like a rather weird change, so let me explain why this isn't as unreasonable as it looks. Let's start with the problem it's solving.
Right now, we codegen this as:
In this example, we have two values which must be assigned to a0 per the ABI (%arg, and the return value). SelectionDAG ensures that all values used in a successor phi are defined before exit the predecessor block. This creates an ADDI to materialize the immediate in the entry block.
Currently, this ADDI is not sunk into the tail block because we'd have to split a critical edges to do so. Note that if our immediate was anything large enough to require two instructions we would split this critical edge.
Looking at other targets, we notice that they don't seem to have this problem. They perform the sinking, and tail duplication that we don't. Why? Well, it turns out for AArch64 that this is entirely an accident of the existance of the gpr32all register class. The immediate is materialized into the gpr32 class, and then copied into the gpr32all register class. The existance of that copy puts us right back into the two instruction case noted above.
This change essentially just bypasses this emergent behavior aspect of the aarch64 behavior, and implements the same "always sink immediates" behavior for RISCV as well.