[TTI][RISCV] Unconditionally break critical edges to sink ADDI #108889

preames · 2024-09-16T21:31:05Z

This looks like a rather weird change, so let me explain why this isn't as unreasonable as it looks. Let's start with the problem it's solving.

define signext i32 @overlap_live_ranges(ptr %arg, i32 signext %arg1) { bb:
  %i = icmp eq i32 %arg1, 1
  br i1 %i, label %bb2, label %bb5

bb2:                                              ; preds = %bb
  %i3 = getelementptr inbounds nuw i8, ptr %arg, i64 4
  %i4 = load i32, ptr %i3, align 4
  br label %bb5

bb5:                                              ; preds = %bb2, %bb
  %i6 = phi i32 [ %i4, %bb2 ], [ 13, %bb ]
  ret i32 %i6
}

Right now, we codegen this as:

	li	a3, 1
	li	a2, 13
	bne	a1, a3, .LBB0_2
	lw	a2, 4(a0)
.LBB0_2:
	mv	a0, a2
	ret

In this example, we have two values which must be assigned to a0 per the ABI (%arg, and the return value). SelectionDAG ensures that all values used in a successor phi are defined before exit the predecessor block. This creates an ADDI to materialize the immediate in the entry block.

Currently, this ADDI is not sunk into the tail block because we'd have to split a critical edges to do so. Note that if our immediate was anything large enough to require two instructions we would split this critical edge.

Looking at other targets, we notice that they don't seem to have this problem. They perform the sinking, and tail duplication that we don't. Why? Well, it turns out for AArch64 that this is entirely an accident of the existance of the gpr32all register class. The immediate is materialized into the gpr32 class, and then copied into the gpr32all register class. The existance of that copy puts us right back into the two instruction case noted above.

This change essentially just bypasses this emergent behavior aspect of the aarch64 behavior, and implements the same "always sink immediates" behavior for RISCV as well.

This looks like a rather weird change, so let me explain why this isn't as unreasonable as it looks. Let's start with the problem it's solving. define signext i32 @overlap_live_ranges(ptr %arg, i32 signext %arg1) { bb: %i = icmp eq i32 %arg1, 1 br i1 %i, label %bb2, label %bb5 bb2: ; preds = %bb %i3 = getelementptr inbounds nuw i8, ptr %arg, i64 4 %i4 = load i32, ptr %i3, align 4 br label %bb5 bb5: ; preds = %bb2, %bb %i6 = phi i32 [ %i4, %bb2 ], [ 13, %bb ] ret i32 %i6 } Right now, we codegen this as: li a3, 1 li a2, 13 bne a1, a3, .LBB0_2 lw a2, 4(a0) .LBB0_2: mv a0, a2 ret In this example, we have two values which must be assigned to a0 per the ABI (%arg, and the return value). SelectionDAG ensures that all values used in a successor phi are defined before exit the predecessor block. This creates an ADDI to materialize the immediate in the entry block. Currently, this ADDI is not sunk into the tail block because we'd have to split a critical edges to do so. Note that if our immediate was anything large enough to require two instructions we *would* split this critical edge. Looking at other targets, we notice that they don't seem to have this problem. They perform the sinking, and tail duplication that we don't. Why? Well, it turns out for AArch64 that this is entirely an accident of the existance of the gpr32all register class. The immediate is materialized into the gpr32 class, and then copied into the gpr32all register class. The existance of that copy puts us right back into the two instruction case noted above. This change essentially just cuts around this accident, and implements the same "always sink immediates" behavior for RISCV as well.

preames · 2024-09-16T21:35:41Z

llvm/test/CodeGen/RISCV/select-const.ll

 ; RV64-NEXT:  .LBB3_2:
+; RV64-NEXT:    li a0, 6


These select lowering changes are arguably real regressions. We are loosing the opportunity to allocate the untaken result into a register and then conditionally overwrite it. The tail duplication probably wouldn't happen in real code, so we'd end up emitting an extra unconditional jump to bypass the critical edge we split before the join.

Specifically for these selects in tests, most can be done via arithmetic expansions instead. I started to implement that, but ended up pulling on a few too many intertwined changes. I do plan to go back to this, but I don't consider the select impact blocking here.

llvm/lib/Target/RISCV/RISCVInstrInfo.h

llvmbot · 2024-09-16T21:58:49Z

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-risc-v

Author: Philip Reames (preames)

Changes

This looks like a rather weird change, so let me explain why this isn't as unreasonable as it looks. Let's start with the problem it's solving.

define signext i32 @<!-- -->overlap_live_ranges(ptr %arg, i32 signext %arg1) { bb:
  %i = icmp eq i32 %arg1, 1
  br i1 %i, label %bb2, label %bb5

bb2:                                              ; preds = %bb
  %i3 = getelementptr inbounds nuw i8, ptr %arg, i64 4
  %i4 = load i32, ptr %i3, align 4
  br label %bb5

bb5:                                              ; preds = %bb2, %bb
  %i6 = phi i32 [ %i4, %bb2 ], [ 13, %bb ]
  ret i32 %i6
}

Right now, we codegen this as:

	li	a3, 1
	li	a2, 13
	bne	a1, a3, .LBB0_2
	lw	a2, 4(a0)
.LBB0_2:
	mv	a0, a2
	ret

In this example, we have two values which must be assigned to a0 per the ABI (%arg, and the return value). SelectionDAG ensures that all values used in a successor phi are defined before exit the predecessor block. This creates an ADDI to materialize the immediate in the entry block.

Currently, this ADDI is not sunk into the tail block because we'd have to split a critical edges to do so. Note that if our immediate was anything large enough to require two instructions we would split this critical edge.

Looking at other targets, we notice that they don't seem to have this problem. They perform the sinking, and tail duplication that we don't. Why? Well, it turns out for AArch64 that this is entirely an accident of the existance of the gpr32all register class. The immediate is materialized into the gpr32 class, and then copied into the gpr32all register class. The existance of that copy puts us right back into the two instruction case noted above.

This change essentially just bypasses this emergent behavior aspect of the aarch64 behavior, and implements the same "always sink immediates" behavior for RISCV as well.

Patch is 23.40 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/108889.diff

10 Files Affected:

(modified) llvm/include/llvm/CodeGen/TargetInstrInfo.h (+6)
(modified) llvm/lib/CodeGen/MachineSink.cpp (+3-1)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfo.h (+4)
(modified) llvm/test/CodeGen/RISCV/aext-to-sext.ll (+8-6)
(modified) llvm/test/CodeGen/RISCV/machine-sink-load-immediate.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/rv64m-w-insts-legalization.ll (+4-6)
(modified) llvm/test/CodeGen/RISCV/select-const.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/select.ll (+24-24)
(modified) llvm/test/CodeGen/RISCV/sextw-removal.ll (+48-44)
(modified) llvm/test/CodeGen/RISCV/typepromotion-overflow.ll (+69-57)

diff --git a/llvm/include/llvm/CodeGen/TargetInstrInfo.h b/llvm/include/llvm/CodeGen/TargetInstrInfo.h
index a3bfc63f2a4790..aef2c1748a099d 100644
--- a/llvm/include/llvm/CodeGen/TargetInstrInfo.h
+++ b/llvm/include/llvm/CodeGen/TargetInstrInfo.h
@@ -159,6 +159,12 @@ class TargetInstrInfo : public MCInstrInfo {
     return true;
   }
 
+  /// For a "cheap" instruction which doesn't enable additional sinking,
+  /// should MachineSink break a critical edge to sink it anyways?
+  virtual bool shouldBreakCriticalEdgeToSink(MachineInstr &MI) const {
+    return false;
+  }
+
 protected:
   /// For instructions with opcodes for which the M_REMATERIALIZABLE flag is
   /// set, this hook lets the target specify whether the instruction is actually
diff --git a/llvm/lib/CodeGen/MachineSink.cpp b/llvm/lib/CodeGen/MachineSink.cpp
index 609f9af9767f5d..d34ee4f1040ffc 100644
--- a/llvm/lib/CodeGen/MachineSink.cpp
+++ b/llvm/lib/CodeGen/MachineSink.cpp
@@ -952,7 +952,9 @@ bool MachineSinking::isWorthBreakingCriticalEdge(
     }
   }
 
-  return false;
+  // Let the target decide if it's worth breaking this
+  // critical edge for a "cheap" instruction.
+  return TII->shouldBreakCriticalEdgeToSink(MI);
 }
 
 bool MachineSinking::isLegalToBreakCriticalEdge(MachineInstr &MI,
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.h b/llvm/lib/Target/RISCV/RISCVInstrInfo.h
index 457db9b9860d00..4fb7e68e23d422 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.h
@@ -78,6 +78,10 @@ class RISCVInstrInfo : public RISCVGenInstrInfo {
 
   bool isReallyTriviallyReMaterializable(const MachineInstr &MI) const override;
 
+  bool shouldBreakCriticalEdgeToSink(MachineInstr &MI) const override {
+    return MI.getOpcode() == RISCV::ADDI;
+  }
+
   void copyPhysRegVector(MachineBasicBlock &MBB,
                          MachineBasicBlock::iterator MBBI, const DebugLoc &DL,
                          MCRegister DstReg, MCRegister SrcReg, bool KillSrc,
diff --git a/llvm/test/CodeGen/RISCV/aext-to-sext.ll b/llvm/test/CodeGen/RISCV/aext-to-sext.ll
index 888ea666d71316..f3f71a923bdc29 100644
--- a/llvm/test/CodeGen/RISCV/aext-to-sext.ll
+++ b/llvm/test/CodeGen/RISCV/aext-to-sext.ll
@@ -78,12 +78,14 @@ bar:
 define i64 @sext_phi_constants(i32 signext %c) {
 ; RV64I-LABEL: sext_phi_constants:
 ; RV64I:       # %bb.0:
-; RV64I-NEXT:    li a1, -1
-; RV64I-NEXT:    bnez a0, .LBB2_2
-; RV64I-NEXT:  # %bb.1: # %iffalse
-; RV64I-NEXT:    li a1, -2
-; RV64I-NEXT:  .LBB2_2: # %merge
-; RV64I-NEXT:    slli a0, a1, 32
+; RV64I-NEXT:    beqz a0, .LBB2_2
+; RV64I-NEXT:  # %bb.1:
+; RV64I-NEXT:    li a0, -1
+; RV64I-NEXT:    j .LBB2_3
+; RV64I-NEXT:  .LBB2_2: # %iffalse
+; RV64I-NEXT:    li a0, -2
+; RV64I-NEXT:  .LBB2_3: # %merge
+; RV64I-NEXT:    slli a0, a0, 32
 ; RV64I-NEXT:    srli a0, a0, 32
 ; RV64I-NEXT:    ret
   %a = icmp ne i32 %c, 0
diff --git a/llvm/test/CodeGen/RISCV/machine-sink-load-immediate.ll b/llvm/test/CodeGen/RISCV/machine-sink-load-immediate.ll
index 775ea8e820afe3..f16bc0507cb480 100644
--- a/llvm/test/CodeGen/RISCV/machine-sink-load-immediate.ll
+++ b/llvm/test/CodeGen/RISCV/machine-sink-load-immediate.ll
@@ -184,13 +184,13 @@ declare i32 @toupper()
 define signext i32 @overlap_live_ranges(ptr %arg, i32 signext %arg1) {
 ; CHECK-LABEL: overlap_live_ranges:
 ; CHECK:       # %bb.0: # %bb
-; CHECK-NEXT:    li a3, 1
-; CHECK-NEXT:    li a2, 13
-; CHECK-NEXT:    bne a1, a3, .LBB1_2
+; CHECK-NEXT:    li a2, 1
+; CHECK-NEXT:    bne a1, a2, .LBB1_2
 ; CHECK-NEXT:  # %bb.1: # %bb2
-; CHECK-NEXT:    lw a2, 4(a0)
-; CHECK-NEXT:  .LBB1_2: # %bb5
-; CHECK-NEXT:    mv a0, a2
+; CHECK-NEXT:    lw a0, 4(a0)
+; CHECK-NEXT:    ret
+; CHECK-NEXT:  .LBB1_2:
+; CHECK-NEXT:    li a0, 13
 ; CHECK-NEXT:    ret
 bb:
   %i = icmp eq i32 %arg1, 1
diff --git a/llvm/test/CodeGen/RISCV/rv64m-w-insts-legalization.ll b/llvm/test/CodeGen/RISCV/rv64m-w-insts-legalization.ll
index f69909e76d4c10..a2c572e07ff7d0 100644
--- a/llvm/test/CodeGen/RISCV/rv64m-w-insts-legalization.ll
+++ b/llvm/test/CodeGen/RISCV/rv64m-w-insts-legalization.ll
@@ -5,15 +5,13 @@ define signext i32 @mulw(i32 signext %s, i32 signext %n, i32 signext %k) nounwin
 ; CHECK-LABEL: mulw:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    li a2, 1
-; CHECK-NEXT:    bge a0, a1, .LBB0_3
-; CHECK-NEXT:  # %bb.1: # %for.body.preheader
-; CHECK-NEXT:    li a2, 1
-; CHECK-NEXT:  .LBB0_2: # %for.body
+; CHECK-NEXT:    bge a0, a1, .LBB0_2
+; CHECK-NEXT:  .LBB0_1: # %for.body
 ; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
 ; CHECK-NEXT:    mulw a2, a0, a2
 ; CHECK-NEXT:    addiw a0, a0, 1
-; CHECK-NEXT:    blt a0, a1, .LBB0_2
-; CHECK-NEXT:  .LBB0_3: # %for.cond.cleanup
+; CHECK-NEXT:    blt a0, a1, .LBB0_1
+; CHECK-NEXT:  .LBB0_2: # %for.cond.cleanup
 ; CHECK-NEXT:    mv a0, a2
 ; CHECK-NEXT:    ret
 entry:
diff --git a/llvm/test/CodeGen/RISCV/select-const.ll b/llvm/test/CodeGen/RISCV/select-const.ll
index 792df6236ddc0e..b3e32b1f5c9c3d 100644
--- a/llvm/test/CodeGen/RISCV/select-const.ll
+++ b/llvm/test/CodeGen/RISCV/select-const.ll
@@ -61,22 +61,22 @@ define signext i32 @select_const_int_pow2_zero(i1 zeroext %a) nounwind {
 define signext i32 @select_const_int_harder(i1 zeroext %a) nounwind {
 ; RV32-LABEL: select_const_int_harder:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    mv a1, a0
-; RV32-NEXT:    li a0, 6
-; RV32-NEXT:    bnez a1, .LBB3_2
+; RV32-NEXT:    bnez a0, .LBB3_2
 ; RV32-NEXT:  # %bb.1:
 ; RV32-NEXT:    li a0, 38
+; RV32-NEXT:    ret
 ; RV32-NEXT:  .LBB3_2:
+; RV32-NEXT:    li a0, 6
 ; RV32-NEXT:    ret
 ;
 ; RV64-LABEL: select_const_int_harder:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    mv a1, a0
-; RV64-NEXT:    li a0, 6
-; RV64-NEXT:    bnez a1, .LBB3_2
+; RV64-NEXT:    bnez a0, .LBB3_2
 ; RV64-NEXT:  # %bb.1:
 ; RV64-NEXT:    li a0, 38
+; RV64-NEXT:    ret
 ; RV64-NEXT:  .LBB3_2:
+; RV64-NEXT:    li a0, 6
 ; RV64-NEXT:    ret
   %1 = select i1 %a, i32 6, i32 38
   ret i32 %1
diff --git a/llvm/test/CodeGen/RISCV/select.ll b/llvm/test/CodeGen/RISCV/select.ll
index 8aa50cc0f39c1a..e3766c024197ef 100644
--- a/llvm/test/CodeGen/RISCV/select.ll
+++ b/llvm/test/CodeGen/RISCV/select.ll
@@ -1585,22 +1585,22 @@ define i32 @select_cst_not5(i32 signext %a, i32 signext %b) {
 define i32 @select_cst_unknown(i32 signext %a, i32 signext %b) {
 ; RV32IM-LABEL: select_cst_unknown:
 ; RV32IM:       # %bb.0:
-; RV32IM-NEXT:    mv a2, a0
-; RV32IM-NEXT:    li a0, 5
-; RV32IM-NEXT:    blt a2, a1, .LBB42_2
+; RV32IM-NEXT:    blt a0, a1, .LBB42_2
 ; RV32IM-NEXT:  # %bb.1:
 ; RV32IM-NEXT:    li a0, -7
+; RV32IM-NEXT:    ret
 ; RV32IM-NEXT:  .LBB42_2:
+; RV32IM-NEXT:    li a0, 5
 ; RV32IM-NEXT:    ret
 ;
 ; RV64IM-LABEL: select_cst_unknown:
 ; RV64IM:       # %bb.0:
-; RV64IM-NEXT:    mv a2, a0
-; RV64IM-NEXT:    li a0, 5
-; RV64IM-NEXT:    blt a2, a1, .LBB42_2
+; RV64IM-NEXT:    blt a0, a1, .LBB42_2
 ; RV64IM-NEXT:  # %bb.1:
 ; RV64IM-NEXT:    li a0, -7
+; RV64IM-NEXT:    ret
 ; RV64IM-NEXT:  .LBB42_2:
+; RV64IM-NEXT:    li a0, 5
 ; RV64IM-NEXT:    ret
 ;
 ; RV64IMXVTCONDOPS-LABEL: select_cst_unknown:
@@ -1626,22 +1626,22 @@ define i32 @select_cst_unknown(i32 signext %a, i32 signext %b) {
 define i32 @select_cst1(i1 zeroext %cond) {
 ; RV32IM-LABEL: select_cst1:
 ; RV32IM:       # %bb.0:
-; RV32IM-NEXT:    mv a1, a0
-; RV32IM-NEXT:    li a0, 10
-; RV32IM-NEXT:    bnez a1, .LBB43_2
+; RV32IM-NEXT:    bnez a0, .LBB43_2
 ; RV32IM-NEXT:  # %bb.1:
 ; RV32IM-NEXT:    li a0, 20
+; RV32IM-NEXT:    ret
 ; RV32IM-NEXT:  .LBB43_2:
+; RV32IM-NEXT:    li a0, 10
 ; RV32IM-NEXT:    ret
 ;
 ; RV64IM-LABEL: select_cst1:
 ; RV64IM:       # %bb.0:
-; RV64IM-NEXT:    mv a1, a0
-; RV64IM-NEXT:    li a0, 10
-; RV64IM-NEXT:    bnez a1, .LBB43_2
+; RV64IM-NEXT:    bnez a0, .LBB43_2
 ; RV64IM-NEXT:  # %bb.1:
 ; RV64IM-NEXT:    li a0, 20
+; RV64IM-NEXT:    ret
 ; RV64IM-NEXT:  .LBB43_2:
+; RV64IM-NEXT:    li a0, 10
 ; RV64IM-NEXT:    ret
 ;
 ; RV64IMXVTCONDOPS-LABEL: select_cst1:
@@ -1664,24 +1664,24 @@ define i32 @select_cst1(i1 zeroext %cond) {
 define i32 @select_cst2(i1 zeroext %cond) {
 ; RV32IM-LABEL: select_cst2:
 ; RV32IM:       # %bb.0:
-; RV32IM-NEXT:    mv a1, a0
-; RV32IM-NEXT:    li a0, 10
-; RV32IM-NEXT:    bnez a1, .LBB44_2
+; RV32IM-NEXT:    bnez a0, .LBB44_2
 ; RV32IM-NEXT:  # %bb.1:
 ; RV32IM-NEXT:    lui a0, 5
 ; RV32IM-NEXT:    addi a0, a0, -480
+; RV32IM-NEXT:    ret
 ; RV32IM-NEXT:  .LBB44_2:
+; RV32IM-NEXT:    li a0, 10
 ; RV32IM-NEXT:    ret
 ;
 ; RV64IM-LABEL: select_cst2:
 ; RV64IM:       # %bb.0:
-; RV64IM-NEXT:    mv a1, a0
-; RV64IM-NEXT:    li a0, 10
-; RV64IM-NEXT:    bnez a1, .LBB44_2
+; RV64IM-NEXT:    bnez a0, .LBB44_2
 ; RV64IM-NEXT:  # %bb.1:
 ; RV64IM-NEXT:    lui a0, 5
 ; RV64IM-NEXT:    addiw a0, a0, -480
+; RV64IM-NEXT:    ret
 ; RV64IM-NEXT:  .LBB44_2:
+; RV64IM-NEXT:    li a0, 10
 ; RV64IM-NEXT:    ret
 ;
 ; RV64IMXVTCONDOPS-LABEL: select_cst2:
@@ -1782,24 +1782,24 @@ define i32 @select_cst4(i1 zeroext %cond) {
 define i32 @select_cst5(i1 zeroext %cond) {
 ; RV32IM-LABEL: select_cst5:
 ; RV32IM:       # %bb.0:
-; RV32IM-NEXT:    mv a1, a0
-; RV32IM-NEXT:    li a0, 2047
-; RV32IM-NEXT:    bnez a1, .LBB47_2
+; RV32IM-NEXT:    bnez a0, .LBB47_2
 ; RV32IM-NEXT:  # %bb.1:
 ; RV32IM-NEXT:    lui a0, 1
 ; RV32IM-NEXT:    addi a0, a0, -2047
+; RV32IM-NEXT:    ret
 ; RV32IM-NEXT:  .LBB47_2:
+; RV32IM-NEXT:    li a0, 2047
 ; RV32IM-NEXT:    ret
 ;
 ; RV64IM-LABEL: select_cst5:
 ; RV64IM:       # %bb.0:
-; RV64IM-NEXT:    mv a1, a0
-; RV64IM-NEXT:    li a0, 2047
-; RV64IM-NEXT:    bnez a1, .LBB47_2
+; RV64IM-NEXT:    bnez a0, .LBB47_2
 ; RV64IM-NEXT:  # %bb.1:
 ; RV64IM-NEXT:    lui a0, 1
 ; RV64IM-NEXT:    addiw a0, a0, -2047
+; RV64IM-NEXT:    ret
 ; RV64IM-NEXT:  .LBB47_2:
+; RV64IM-NEXT:    li a0, 2047
 ; RV64IM-NEXT:    ret
 ;
 ; RV64IMXVTCONDOPS-LABEL: select_cst5:
diff --git a/llvm/test/CodeGen/RISCV/sextw-removal.ll b/llvm/test/CodeGen/RISCV/sextw-removal.ll
index 8cf78551d28f98..58c3dd18875571 100644
--- a/llvm/test/CodeGen/RISCV/sextw-removal.ll
+++ b/llvm/test/CodeGen/RISCV/sextw-removal.ll
@@ -1032,82 +1032,86 @@ bb7:                                              ; preds = %bb2
 define signext i32 @bug(i32 signext %x) {
 ; CHECK-LABEL: bug:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    beqz a0, .LBB18_4
+; CHECK-NEXT:    beqz a0, .LBB18_5
 ; CHECK-NEXT:  # %bb.1: # %if.end
-; CHECK-NEXT:    srliw a2, a0, 16
-; CHECK-NEXT:    seqz a1, a2
-; CHECK-NEXT:    slli a1, a1, 4
-; CHECK-NEXT:    sllw a1, a0, a1
-; CHECK-NEXT:    li a0, 16
-; CHECK-NEXT:    beqz a2, .LBB18_3
+; CHECK-NEXT:    srliw a1, a0, 16
+; CHECK-NEXT:    seqz a2, a1
+; CHECK-NEXT:    slli a2, a2, 4
+; CHECK-NEXT:    sllw a0, a0, a2
+; CHECK-NEXT:    beqz a1, .LBB18_3
 ; CHECK-NEXT:  # %bb.2: # %if.end
-; CHECK-NEXT:    li a0, 32
-; CHECK-NEXT:  .LBB18_3: # %if.end
-; CHECK-NEXT:    srliw a2, a1, 24
+; CHECK-NEXT:    li a1, 32
+; CHECK-NEXT:    j .LBB18_4
+; CHECK-NEXT:  .LBB18_3:
+; CHECK-NEXT:    li a1, 16
+; CHECK-NEXT:  .LBB18_4: # %if.end
+; CHECK-NEXT:    srliw a2, a0, 24
 ; CHECK-NEXT:    seqz a2, a2
 ; CHECK-NEXT:    slli a3, a2, 3
-; CHECK-NEXT:    sllw a1, a1, a3
+; CHECK-NEXT:    sllw a0, a0, a3
 ; CHECK-NEXT:    negw a2, a2
 ; CHECK-NEXT:    andi a2, a2, -8
-; CHECK-NEXT:    add a0, a0, a2
-; CHECK-NEXT:    srliw a2, a1, 28
+; CHECK-NEXT:    add a1, a1, a2
+; CHECK-NEXT:    srliw a2, a0, 28
 ; CHECK-NEXT:    seqz a2, a2
 ; CHECK-NEXT:    slli a3, a2, 2
-; CHECK-NEXT:    sllw a1, a1, a3
+; CHECK-NEXT:    sllw a0, a0, a3
 ; CHECK-NEXT:    negw a2, a2
 ; CHECK-NEXT:    andi a2, a2, -4
-; CHECK-NEXT:    add a0, a0, a2
-; CHECK-NEXT:    srliw a2, a1, 30
+; CHECK-NEXT:    add a1, a1, a2
+; CHECK-NEXT:    srliw a2, a0, 30
 ; CHECK-NEXT:    seqz a2, a2
 ; CHECK-NEXT:    slli a3, a2, 1
-; CHECK-NEXT:    sllw a1, a1, a3
+; CHECK-NEXT:    sllw a0, a0, a3
 ; CHECK-NEXT:    negw a2, a2
 ; CHECK-NEXT:    andi a2, a2, -2
-; CHECK-NEXT:    add a0, a0, a2
-; CHECK-NEXT:    not a1, a1
-; CHECK-NEXT:    srli a1, a1, 31
-; CHECK-NEXT:    addw a0, a0, a1
-; CHECK-NEXT:  .LBB18_4: # %cleanup
+; CHECK-NEXT:    add a1, a1, a2
+; CHECK-NEXT:    not a0, a0
+; CHECK-NEXT:    srli a0, a0, 31
+; CHECK-NEXT:    addw a0, a1, a0
+; CHECK-NEXT:  .LBB18_5: # %cleanup
 ; CHECK-NEXT:    ret
 ;
 ; NOREMOVAL-LABEL: bug:
 ; NOREMOVAL:       # %bb.0: # %entry
-; NOREMOVAL-NEXT:    beqz a0, .LBB18_4
+; NOREMOVAL-NEXT:    beqz a0, .LBB18_5
 ; NOREMOVAL-NEXT:  # %bb.1: # %if.end
-; NOREMOVAL-NEXT:    srliw a2, a0, 16
-; NOREMOVAL-NEXT:    seqz a1, a2
-; NOREMOVAL-NEXT:    slli a1, a1, 4
-; NOREMOVAL-NEXT:    sllw a1, a0, a1
-; NOREMOVAL-NEXT:    li a0, 16
-; NOREMOVAL-NEXT:    beqz a2, .LBB18_3
+; NOREMOVAL-NEXT:    srliw a1, a0, 16
+; NOREMOVAL-NEXT:    seqz a2, a1
+; NOREMOVAL-NEXT:    slli a2, a2, 4
+; NOREMOVAL-NEXT:    sllw a0, a0, a2
+; NOREMOVAL-NEXT:    beqz a1, .LBB18_3
 ; NOREMOVAL-NEXT:  # %bb.2: # %if.end
-; NOREMOVAL-NEXT:    li a0, 32
-; NOREMOVAL-NEXT:  .LBB18_3: # %if.end
-; NOREMOVAL-NEXT:    srliw a2, a1, 24
+; NOREMOVAL-NEXT:    li a1, 32
+; NOREMOVAL-NEXT:    j .LBB18_4
+; NOREMOVAL-NEXT:  .LBB18_3:
+; NOREMOVAL-NEXT:    li a1, 16
+; NOREMOVAL-NEXT:  .LBB18_4: # %if.end
+; NOREMOVAL-NEXT:    srliw a2, a0, 24
 ; NOREMOVAL-NEXT:    seqz a2, a2
 ; NOREMOVAL-NEXT:    slli a3, a2, 3
-; NOREMOVAL-NEXT:    sllw a1, a1, a3
+; NOREMOVAL-NEXT:    sllw a0, a0, a3
 ; NOREMOVAL-NEXT:    negw a2, a2
 ; NOREMOVAL-NEXT:    andi a2, a2, -8
-; NOREMOVAL-NEXT:    add a0, a0, a2
-; NOREMOVAL-NEXT:    srliw a2, a1, 28
+; NOREMOVAL-NEXT:    add a1, a1, a2
+; NOREMOVAL-NEXT:    srliw a2, a0, 28
 ; NOREMOVAL-NEXT:    seqz a2, a2
 ; NOREMOVAL-NEXT:    slli a3, a2, 2
-; NOREMOVAL-NEXT:    sllw a1, a1, a3
+; NOREMOVAL-NEXT:    sllw a0, a0, a3
 ; NOREMOVAL-NEXT:    negw a2, a2
 ; NOREMOVAL-NEXT:    andi a2, a2, -4
-; NOREMOVAL-NEXT:    add a0, a0, a2
-; NOREMOVAL-NEXT:    srliw a2, a1, 30
+; NOREMOVAL-NEXT:    add a1, a1, a2
+; NOREMOVAL-NEXT:    srliw a2, a0, 30
 ; NOREMOVAL-NEXT:    seqz a2, a2
 ; NOREMOVAL-NEXT:    slli a3, a2, 1
-; NOREMOVAL-NEXT:    sllw a1, a1, a3
+; NOREMOVAL-NEXT:    sllw a0, a0, a3
 ; NOREMOVAL-NEXT:    negw a2, a2
 ; NOREMOVAL-NEXT:    andi a2, a2, -2
-; NOREMOVAL-NEXT:    add a0, a0, a2
-; NOREMOVAL-NEXT:    not a1, a1
-; NOREMOVAL-NEXT:    srli a1, a1, 31
-; NOREMOVAL-NEXT:    addw a0, a0, a1
-; NOREMOVAL-NEXT:  .LBB18_4: # %cleanup
+; NOREMOVAL-NEXT:    add a1, a1, a2
+; NOREMOVAL-NEXT:    not a0, a0
+; NOREMOVAL-NEXT:    srli a0, a0, 31
+; NOREMOVAL-NEXT:    addw a0, a1, a0
+; NOREMOVAL-NEXT:  .LBB18_5: # %cleanup
 ; NOREMOVAL-NEXT:    ret
 entry:
   %tobool.not = icmp eq i32 %x, 0
diff --git a/llvm/test/CodeGen/RISCV/typepromotion-overflow.ll b/llvm/test/CodeGen/RISCV/typepromotion-overflow.ll
index ec7e0ecce80caa..ae1aabed498059 100644
--- a/llvm/test/CodeGen/RISCV/typepromotion-overflow.ll
+++ b/llvm/test/CodeGen/RISCV/typepromotion-overflow.ll
@@ -7,13 +7,14 @@ define zeroext i16 @overflow_add(i16 zeroext %a, i16 zeroext %b) {
 ; CHECK-NEXT:    add a0, a1, a0
 ; CHECK-NEXT:    ori a0, a0, 1
 ; CHECK-NEXT:    slli a0, a0, 48
-; CHECK-NEXT:    srli a1, a0, 48
-; CHECK-NEXT:    li a2, 1024
-; CHECK-NEXT:    li a0, 2
-; CHECK-NEXT:    bltu a2, a1, .LBB0_2
+; CHECK-NEXT:    srli a0, a0, 48
+; CHECK-NEXT:    li a1, 1024
+; CHECK-NEXT:    bltu a1, a0, .LBB0_2
 ; CHECK-NEXT:  # %bb.1:
 ; CHECK-NEXT:    li a0, 5
+; CHECK-NEXT:    ret
 ; CHECK-NEXT:  .LBB0_2:
+; CHECK-NEXT:    li a0, 2
 ; CHECK-NEXT:    ret
   %add = add i16 %b, %a
   %or = or i16 %add, 1
@@ -28,13 +29,14 @@ define zeroext i16 @overflow_sub(i16 zeroext %a, i16 zeroext %b) {
 ; CHECK-NEXT:    subw a0, a0, a1
 ; CHECK-NEXT:    ori a0, a0, 1
 ; CHECK-NEXT:    slli a0, a0, 48
-; CHECK-NEXT:    srli a1, a0, 48
-; CHECK-NEXT:    li a2, 1024
-; CHECK-NEXT:    li a0, 2
-; CHECK-NEXT:    bltu a2, a1, .LBB1_2
+; CHECK-NEXT:    srli a0, a0, 48
+; CHECK-NEXT:    li a1, 1024
+; CHECK-NEXT:    bltu a1, a0, .LBB1_2
 ; CHECK-NEXT:  # %bb.1:
 ; CHECK-NEXT:    li a0, 5
+; CHECK-NEXT:    ret
 ; CHECK-NEXT:  .LBB1_2:
+; CHECK-NEXT:    li a0, 2
 ; CHECK-NEXT:    ret
   %add = sub i16 %a, %b
   %or = or i16 %add, 1
@@ -49,13 +51,14 @@ define zeroext i16 @overflow_mul(i16 zeroext %a, i16 zeroext %b) {
 ; CHECK-NEXT:    mul a0, a1, a0
 ; CHECK-NEXT:    ori a0, a0, 1
 ; CHECK-NEXT:    slli a0, a0, 48
-; CHECK-NEXT:    srli a1, a0, 48
-; CHECK-NEXT:    li a2, 1024
-; CHECK-NEXT:    li a0, 2
-; CHECK-NEXT:    bltu a2, a1, .LBB2_2
+; CHECK-NEXT:    srli a0, a0, 48
+; CHECK-NEXT:    li a1, 1024
+; CHECK-NEXT:    bltu a1, a0, .LBB2_2
 ; CHECK-NEXT:  # %bb.1:
 ; CHECK-NEXT:    li a0, 5
+; CHECK-NEXT:    ret
 ; CHECK-NEXT:  .LBB2_2:
+; CHECK-NEXT:    li a0, 2
 ; CHECK-NEXT:    ret
   %add = mul i16 %b, %a
   %or = or i16 %add, 1
@@ -70,13 +73,14 @@ define zeroext i16 @overflow_shl(i16 zeroext %a, i16 zeroext %b) {
 ; CHECK-NEXT:    sll a0, a0, a1
 ; CHECK-NEXT:    ori a0, a0, 1
 ; CHECK-NEXT:    slli a0, a0, 48
-; CHECK-NEXT:    srli a1, a0, 48
-; CHECK-NEXT:    li a2, 1024
-; CHECK-NEXT:    li a0, 2
-; CHECK-NEXT:    bltu a2, a1, .LBB3_2
+; CHECK-NEXT:    srli a0, a0, 48
+; CHECK-NEXT:    li a1, 1024
+; CHECK-NEXT:    bltu a1, a0, .LBB3_2
 ; CHECK-NEXT:  # %bb.1:
 ; CHECK-NEXT:    li a0, 5
+; CHECK-NEXT:    ret
 ; CHECK-NEXT:  .LBB3_2:
+; CHECK-NEXT:    li a0, 2
 ; CHECK-NEXT:    ret
   %add = shl i16 %a, %b
   %or = or i16 %add, 1
@@ -89,12 +93,13 @@ define i32 @overflow_add_no_consts(i8 zeroext %a, i8 zeroext %b, i8 zeroext %lim
 ; CHECK-LABEL: overflow_add_no_consts:
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    add a0, a1, a0
-; CHECK-NEXT:    andi a1, a0, 255
-; CHECK-NEXT:    li a0, 8
-; CHECK-NEXT:    bltu a2, a1, .LBB4_2
+; CHECK-NEXT:    andi a0, a0, 255
+; CHECK-NEXT:    bltu a2, a0, .LBB4_2
 ; CHECK-NEXT:  # %bb.1:
 ; CHECK-NEXT:    li a0, 16
+; CHECK-NEXT:    ret
 ; CHECK-NEXT:  .LBB4_2:
+; CHECK-NEXT:    li a0, 8
 ; CHECK-NEXT:    ret
   %add = add i8 %b, %a
   %cmp = icmp ugt i8 %add, %limit
@@ -106,13 +111,14 @@ define i32 @overflow_add_const_limit(i8 zeroext %a, i8 zeroext %b) {
 ; CHECK-LABEL: overflow_add_const_limit:
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    add a0, a1, a0
-; CHECK-NEXT:    andi a1, a0, 255
-; CHECK-NEXT:    li a2, 128
-; CHECK-NEXT:    li a0, 8
-; CHECK-NEXT:    bltu a2, a1, .LBB5_2
+; CHECK-NEXT:    andi a0, a0, 255
+; CHECK-NEXT:    li a1, 128
+; CHECK-NEXT:    bltu a1, a0, .LBB5_2
 ; CHECK-NEXT:  # %bb.1:
 ; CHECK-NEXT:    li a0, 16
+; CHECK-NEXT:    ret
 ; CHECK-NEXT:  .LBB5_2:
+; CHECK-NEXT:    li a0, 8
 ; CHECK-NEXT:    ret
   %add = add i8 %b, %a
   %cmp = icmp ugt i8 %add, -128
@@ -124,13 +130,14 @@ define i32 @overflow_add_positive_const_limit(i8 zeroext %a) {
 ; CHECK-LABEL: overflow_add_positive_const_limit:
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    slli a0, a0, 56
-; CHECK-NEXT:    srai a1, a0, 56
-; CHECK-NEXT:    li a2, -1
-; CHECK-NEXT:    li a0, 8
-; CHECK-NEXT:    blt a1, a2, .LBB6_2
+; CHECK-NEXT:    srai a0, a0, 56
+; CHECK-NEXT:    li a1, -1
+; CHECK-NEXT:    blt a0, a1, .LBB6_2
 ; CHECK-NEXT:  # %bb.1:
 ; CHECK-NEXT:    li a0, 16
+; CHECK-NEXT:    ret
 ; CHECK-NEXT:  .LBB6_2:
+; CHECK-NEXT:    li a0, 8
 ; CHECK-NEXT:    ret
   %cmp = icmp slt i8 %a, -1
   %res = select i1 %cmp, i32 8, i32 16
@@ -140,13 +147,13 @@ define i32 @overflow_add_positive_const_limit(i8 zeroext %a) {
 define i32 @unsafe_add_underflow(i8 zeroext %a) {
 ; CHECK-LABEL: unsafe_add_underflow:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    mv a1, a0
-; CHECK-NEXT:    li a2, 1
-; CHECK-NEXT:    li a0, 8
-; CHECK-NEXT:    beq a1, a2, .LBB7_2
+; CHECK-NEXT:    li a1, 1
+; CHECK-NEXT:    beq a0, a1, .LBB7_2
 ; CHECK-NEXT:  # %bb.1:
 ; CHECK-NEXT:    li a0, 16
+; CHECK-NEXT:    ret
 ; CHECK-NEXT:  .LBB7_2:
+; CHECK-NEXT:    li a0, 8
 ; CHECK-NEXT:    ret
   %cmp = icmp eq i8 %a, 1
   %res = select i1 %cmp, i32 8, i32 16
@@ -156,12 +163,12 @@ define i32 @unsafe_add_underflow(i8 zeroext %a) {
 define i32 @safe_add_underflow(i8 zeroext %a) {
 ; CHECK-LABEL: safe_add_underflow:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    mv a1, a0
-; CHECK-NEXT:    li a0, 8
-; CHECK-NEXT:    beqz a1, .LBB8_2
+; CHECK-NEXT:    beqz a0, .LBB8_2
 ; CHECK-NEXT:  # %bb.1:
 ; CHECK-NEXT:    li a0, 16
+; CHECK-NEXT:    ret
 ; CHECK-NEXT:  .LBB8_2:
+; CHECK-NEXT:    li a0, 8
 ; CHECK-NEXT:    ret
   %cmp = icmp eq i8 %a, 0
   %res = select i1 %cmp, i32 8, i32 16
@@ -171,13 +178,14 @@ define i32 @safe_add_underflow(i8 zeroext %a) {
 define i32 @safe_add_underflow_neg(i8 zeroext %a) {
 ; CHECK-LABEL: safe_add_underflow_neg:
 ; CHECK:       # %bb.0:
-; CHECK-N...
[truncated]

preames · 2024-09-16T23:29:06Z

Looks like I screwed up this PR - I'd not updated all the tests. Working on fixing that, will update once done.

github-actions · 2024-09-17T03:13:05Z

✅ With the latest revision this PR passed the C/C++ code formatter.

preames · 2024-09-17T03:14:26Z

llvm/test/CodeGen/RISCV/compress-opt-select.ll

 define i32 @f_medium_ledge_pos(i32 %in0) minsize {
-  %cmp = icmp CMPCOND i32 %in0, 32
+  %cmp = icmp CMPCOND i32 %in0, 33


The test delta in this file needs a bit of explanation. Most of the diff is me trying to understand the testing to find the actual problem. The only actual change is this line.

Essentially, the original test is confused about what's it's testing. "32" can be handled via the c.addi offset scheme. However, here's the originally assembly we got:

addi a1, a0, -32 li a0, -99 bnez a1, .LBB0_2 # %bb.1: li a0, 42 .LBB0_2: ret

Note that the c.addi form can only be matched when source and destination register are the same.

One this change sinks the unrelated LI out of the way, the register allocator can do that, and we end up with a compressed addi.

From the naming of the test and the comment, this doesn't appear to have been the intended result, and thus I adjusted the constant to 33. If reviewers agree that this is the right test update, I'm going to land a rework of this test file separately.

I agree this is the right test update.

I landed these changes on main, and eventually figured out we could autogen this test without loosing the spirit of the compressed checks. The resulting diff (there is a much bigger one than even I'd realized) should be a lot more obvious now. They do look to be improvements.

topperc · 2024-09-17T04:15:27Z

llvm/test/CodeGen/RISCV/compress-opt-select.ll

@@ -91,14 +87,14 @@ define i32 @f_small_edge_neg(i32 %in0) minsize {
 ; constant is medium and not fit in 6 bit (compress imm),
 ; but fit in 12 bit (imm)
 ; RV32IFDC-LABEL: <f_medium_ledge_pos>:
-; RV32IFDC: addi [[MAYZEROREG:.*]], [[REG:.*]], -0x20
-; RV32IFDC: RESBROPT [[MAYZEROREG]], [[PLACE:.*]]
+; RV32IFDC: addi [[MAYZEROREG:.*]], [[REG:.*]], -0x21


Is REG here zero like in the RV32IFD case?

Nevermind. It's not.

Per the comment, this test is intending to test the first constant which can't be encoded via a c.addi. However, -32 *can* be encoded as in a c.addi, and all that's preventing it from doing so is the register allocators choice to use a difference destination register on the add than it's source. (Which compressed doesn't support.) The current LLC codegen for this test looks like: addi a1, a0, -32 li a0, -99 bnez a1, .LBB0_2 li a0, 42 .LBB0_2: ret After #108889, we sink the LI, and the register allocator picks the same source and dest register for the addi resulting in the c.addi form being emitted. So, to avoid a confusing diff let's fix the test to check what was originally intended.

Per the comment, this test is intending to test the first constant which can't be encoded via a c.addi. However, -32 *can* be encoded as in a c.addi, and all that's preventing it from doing so is the register allocators choice to use a difference destination register on the add than it's source. (Which compressed doesn't support.) The current LLC codegen for this test looks like: addi a1, a0, -32 li a0, -99 bnez a1, .LBB0_2 li a0, 42 .LBB0_2: ret After llvm#108889, we sink the LI, and the register allocator picks the same source and dest register for the addi resulting in the c.addi form being emitted. So, to avoid a confusing diff let's fix the test to check what was originally intended.

asb · 2024-09-20T15:31:04Z

Looking at the diff for this with the GCC torture suite, I think there's one case where it seems this change makes codegen slightly worse that it would be worth you having a quick look at - 20020413-1.c. Previously, each conditional BB loaded an immediate then branched to the common exit block. After this change, the BBs conditionally branch to the BB that loads the needed immediate, which then branches to the common exit block.

This isn't a blocker I don't think, as overall the change is positive.

preames · 2024-11-25T20:51:21Z

Looking at the diff for this with the GCC torture suite, I think there's one case where it seems this change makes codegen slightly worse that it would be worth you having a quick look at - 20020413-1.c. Previously, each conditional BB loaded an immediate then branched to the common exit block. After this change, the BBs conditionally branch to the BB that loads the needed immediate, which then branches to the common exit block.

This isn't a blocker I don't think, as overall the change is positive.

I took a look at this, and it's basically the patch working as intended. One interesting quirk is that it turned out to be specific to having a series of branches. With branch dispatch you see a code change with this patch. With switch dispatch, we already sink into the successor blocks (by construction). So, if anything, it's another reason to land this patch - consistency!

Can I get an LGTM here?

topperc

LGTM

llvm-ci · 2024-11-26T03:10:55Z

LLVM Buildbot has detected a new failure on builder openmp-offload-amdgpu-runtime running on omp-vega20-0 while building llvm at step 7 "Add check check-offload".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/30/builds/11013

Here is the relevant piece of the build log for the reference

Step 7 (Add check check-offload) failure: test (failure)
******************** TEST 'libomptarget :: amdgcn-amd-amdhsa :: sanitizer/kernel_crash_async.c' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 2
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang -fopenmp    -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib  -fopenmp-targets=amdgcn-amd-amdhsa -O3 /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang -fopenmp -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa -O3 /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a
# note: command had no output on stdout or stderr
# RUN: at line 3
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/not --crash env -u LLVM_DISABLE_SYMBOLIZATION OFFLOAD_TRACK_NUM_KERNEL_LAUNCH_TRACES=1 /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp 2>&1 | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c --check-prefixes=TRACE
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/not --crash env -u LLVM_DISABLE_SYMBOLIZATION OFFLOAD_TRACK_NUM_KERNEL_LAUNCH_TRACES=1 /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp
# note: command had no output on stdout or stderr
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c --check-prefixes=TRACE
# note: command had no output on stdout or stderr
# RUN: at line 4
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/not --crash /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp 2>&1 | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c --check-prefixes=CHECK
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/not --crash /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp
# note: command had no output on stdout or stderr
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c --check-prefixes=CHECK
# note: command had no output on stdout or stderr
# RUN: at line 5
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang -fopenmp    -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib  -fopenmp-targets=amdgcn-amd-amdhsa -O3 /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -g
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang -fopenmp -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa -O3 /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -g
# note: command had no output on stdout or stderr
# RUN: at line 6
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/not --crash env -u LLVM_DISABLE_SYMBOLIZATION OFFLOAD_TRACK_NUM_KERNEL_LAUNCH_TRACES=1 /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp 2>&1 | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c --check-prefixes=TRACE
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/not --crash env -u LLVM_DISABLE_SYMBOLIZATION OFFLOAD_TRACK_NUM_KERNEL_LAUNCH_TRACES=1 /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp
# note: command had no output on stdout or stderr
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c --check-prefixes=TRACE
# note: command had no output on stdout or stderr
# RUN: at line 7
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/not --crash /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp 2>&1 | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c --check-prefixes=CHECK
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/not --crash /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp
# note: command had no output on stdout or stderr
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c --check-prefixes=CHECK
# .---command stderr------------
# | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c:39:11: error: CHECK: expected string not found in input
# | // CHECK: Kernel {{[0-9]}}: {{.*}} (__omp_offloading_{{.*}}_main_l29)
# |           ^
# | <stdin>:1:1: note: scanning from here
# | Display only launched kernel:
# | ^
# | <stdin>:2:23: note: possible intended match here
# | Kernel 'omp target in main @ 29 (__omp_offloading_802_b38838e_main_l29)'
# |                       ^
# | 
# | Input file: <stdin>
# | Check file: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c
# | 
...

This test has been failed after commit [RISCV] Implement tail call optimization in machine outliner (PR llvm#115297). Changes to the same file were merged today earlier [TTI][RISCV] Unconditionally break critical edges to sink ADDI (PR llvm#108889).

#117700) This MR fixes failed test `CodeGen/RISCV/compress-opt-select.ll`. It was failed due to previously merged commit `[TTI][RISCV] Unconditionally break critical edges to sink ADDI (PR #108889)`. So, regenerated `compress-opt-select` test.

michaelmaitland · 2025-03-10T21:36:48Z

@topperc and I are findinging that this change is causing a performance regression in dhrystone and perlbench_r.

It leads to splitting the edge and sinking which creates a new basic block and adds an extra branch that takes additional cycles.

Would it be possible for us to use better heuristics on when to split the edges?

I reduced the Perl_sv_cmp_flags function from 500.perlbench_r to highlight what is going on:

define fastcc i32 @Perl_sv_cmp_flags(ptr %sv1, ptr %sv2, i1 %cmp1, i32 %spec.select) nounwind {
entry:
  %tobool.not = icmp eq ptr %sv1, null
  br i1 %tobool.not, label %if.end, label %if.else

if.else:                                          ; preds = %entry
  br i1 %cmp1, label %cond.true, label %if.end

cond.true:                                        ; preds = %if.else
  br label %if.end

if.end:                                           ; preds = %cond.true, %if.else, %entry
  %pv1.0 = phi ptr [ %sv2, %cond.true ], [ null, %entry ], [ null, %if.else ]
  %tobool2.not = icmp eq ptr %sv2, null
  br i1 %tobool2.not, label %if.end106.thread, label %if.end18

if.end18:                                         ; preds = %if.end
  %call121 = call i32 null(ptr %pv1.0, ptr null, i64 0)
  br i1 %cmp1, label %if.else126, label %cleanup

if.end106.thread:                                 ; preds = %if.end
  %tobool107.not48 = icmp eq i64 0, 0
  br i1 %tobool107.not48, label %if.then108, label %cleanup

if.then108:                                       ; preds = %if.end106.thread
  br label %cleanup

if.else126:                                       ; preds = %if.end18
  br label %cleanup

cleanup:                                          ; preds = %if.else126, %if.then108, %if.end106.thread, %if.end18
  %retval.0 = phi i32 [ 1, %if.end106.thread ], [ 0, %if.then108 ], [ %spec.select, %if.else126 ], [ 0, %if.end18 ]
  ret i32 %retval.0
}

With this reduced example, we used to generate:

Perl_sv_cmp_flags:                      # @Perl_sv_cmp_flags
# %bb.0:                                # %entry
        addi    sp, sp, -32
        sd      ra, 24(sp)                      # 8-byte Folded Spill
        sd      s0, 16(sp)                      # 8-byte Folded Spill
        sd      s1, 8(sp)                       # 8-byte Folded Spill
        sd      s2, 0(sp)                       # 8-byte Folded Spill
        mv      s2, a3
        andi    s0, a2, 1
        beqz    a0, .LBB0_3
# %bb.1:                                # %if.else
        beqz    s0, .LBB0_6
# %bb.2:                                # %cond.true
        mv      a0, a1
.LBB0_3:                                # %if.end
        beqz    a1, .LBB0_7
.LBB0_4:                                # %if.end18
        li      s1, 0
        li      a1, 0
        li      a2, 0
        jalr    s1
        beqz    s0, .LBB0_9
# %bb.5:                                # %if.else126
        mv      s1, s2
        j       .LBB0_9
.LBB0_6:
        li      a0, 0
        bnez    a1, .LBB0_4
.LBB0_7:                                # %if.end106.thread
        li      s1, 1
        bnez    zero, .LBB0_9
# %bb.8:                                # %if.then108
        li      s1, 0
.LBB0_9:                                # %cleanup
        mv      a0, s1
        ld      ra, 24(sp)                      # 8-byte Folded Reload
        ld      s0, 16(sp)                      # 8-byte Folded Reload
        ld      s1, 8(sp)                       # 8-byte Folded Reload
        ld      s2, 0(sp)                       # 8-byte Folded Reload
        addi    sp, sp, 32
        ret

Now we generate:

Perl_sv_cmp_flags:                      # @Perl_sv_cmp_flags
# %bb.0:                                # %entry
        addi    sp, sp, -32
        sd      ra, 24(sp)                      # 8-byte Folded Spill
        sd      s0, 16(sp)                      # 8-byte Folded Spill
        sd      s1, 8(sp)                       # 8-byte Folded Spill
        sd      s2, 0(sp)                       # 8-byte Folded Spill
        mv      s2, a3
        andi    s0, a2, 1
        beqz    a0, .LBB0_3
# %bb.1:                                # %if.else
        beqz    s0, .LBB0_6
# %bb.2:                                # %cond.true
        mv      a0, a1
.LBB0_3:                                # %if.end
        beqz    a1, .LBB0_7
.LBB0_4:                                # %if.end18
        li      s1, 0
        li      a1, 0
        li      a2, 0
        jalr    s1
        beqz    s0, .LBB0_10
# %bb.5:                                # %if.else126
        mv      s1, s2
        j       .LBB0_10
.LBB0_6:
        li      a0, 0
        bnez    a1, .LBB0_4
.LBB0_7:                                # %if.end106.thread
        beqz    zero, .LBB0_9
# %bb.8:
        li      s1, 1
        j       .LBB0_10
.LBB0_9:                                # %if.then108
        li      s1, 0
.LBB0_10:                               # %cleanup
        mv      a0, s1
        ld      ra, 24(sp)                      # 8-byte Folded Reload
        ld      s0, 16(sp)                      # 8-byte Folded Reload
        ld      s1, 8(sp)                       # 8-byte Folded Reload
        ld      s2, 0(sp)                       # 8-byte Folded Reload
        addi    sp, sp, 32
        ret

In this reduced case, we end up with an extra basic block (and consequently an extra jump) as a result of this PR.

Please note that I found this in Perl_sv_cmp_flags function of perlbench, but when this patch is reverted, it helps out the dynamic IC in S_regmatch much more than it does in Perl_sv_cmp_flags.

preames · 2025-03-10T21:41:11Z

@michaelmaitland - I will take a look at this tomorrow, and get back to you.

preames · 2025-03-11T19:52:17Z

@michaelmaitland Can I ask you to file a bug for the regression you noted? I think this may have been reduced a bit past usefulness. Looking at the after assembly, the "right" answer for this case is for this:

        beqz    zero, .LBB0_9
# %bb.8:
        li      s1, 1
        j       .LBB0_10
.LBB0_9:                                # %if.then108
        li      s1, 0
.LBB0_10:                               # %cleanup

To become a snez. However, when I opened up the precodegen IR for the routine you mentioned, I don't see this pattern in the IR. I'm not claiming that my particular snapshot from some time ago is evidence against there being a regression here. I'm just saying I need to know how to reproduce the original example, not just the reduced case which may or may not be representative. I suspect that e.g. LTO vs non-LTO might be very important here.

For the record, I do not plan to revert this without significant investigation first. It's been four months since this landed.

preames requested review from asb, michaelmaitland, dtcxzyw and topperc September 16, 2024 21:31

llvmbot added the backend:RISC-V label Sep 16, 2024

preames commented Sep 16, 2024

View reviewed changes

topperc reviewed Sep 16, 2024

View reviewed changes

llvm/lib/Target/RISCV/RISCVInstrInfo.h Outdated Show resolved Hide resolved

Start fixing failed rebase

4c878bb

preames added 2 commits September 16, 2024 17:09

test convert

aeec393

Address review comment

4a89449

preames commented Sep 17, 2024

View reviewed changes

topperc reviewed Sep 17, 2024

View reviewed changes

preames added 2 commits September 17, 2024 09:11

Merge branch 'main' into pr-riscv-machine-sink-addi

5fa0835

Clang format

0617aac

preames added 2 commits November 25, 2024 11:31

Merge branch 'main' into pr-riscv-machine-sink-addi

b06c457

Merge branch 'main' into pr-riscv-machine-sink-addi

bc21dfd

llvmbot added the llvm:globalisel label Nov 25, 2024

topperc approved these changes Nov 25, 2024

View reviewed changes

preames merged commit 6657d4b into llvm:main Nov 26, 2024
9 checks passed

preames deleted the pr-riscv-machine-sink-addi branch November 26, 2024 02:59

mga-sc mentioned this pull request Nov 26, 2024

Reapply "[RISCV] Implement tail call optimization in machine outliner" #117700

Merged

[TTI][RISCV] Unconditionally break critical edges to sink ADDI #108889

[TTI][RISCV] Unconditionally break critical edges to sink ADDI #108889

Uh oh!

Conversation

preames commented Sep 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

preames Sep 16, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvmbot commented Sep 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

preames commented Sep 16, 2024

Uh oh!

github-actions bot commented Sep 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

preames Sep 17, 2024

Choose a reason for hiding this comment

Uh oh!

topperc Sep 17, 2024

Choose a reason for hiding this comment

Uh oh!

preames Sep 17, 2024

Choose a reason for hiding this comment

Uh oh!

topperc Sep 17, 2024

Choose a reason for hiding this comment

Uh oh!

topperc Sep 17, 2024

Choose a reason for hiding this comment

Uh oh!

asb commented Sep 20, 2024

Uh oh!

preames commented Nov 25, 2024

Uh oh!

topperc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Nov 26, 2024

Uh oh!

michaelmaitland commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

preames commented Mar 10, 2025

Uh oh!

preames commented Mar 11, 2025

Uh oh!

Uh oh!

preames commented Sep 16, 2024 •

edited

Loading

llvmbot commented Sep 16, 2024 •

edited

Loading

github-actions bot commented Sep 17, 2024 •

edited

Loading

michaelmaitland commented Mar 10, 2025 •

edited

Loading