-
Notifications
You must be signed in to change notification settings - Fork 15.6k
[RISCV] Use SLLI/ADDI to when materializing select of constants #155644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This avoids the need to materialize the difference explicitly, and thus reduces register pressure when the condition val is otherwise unused. @reviewers 1) This is deliberately subset to be simple. In followups, I'm going to handle the swapped case, but that's a bit more complex and might best be done by refactoring some of the existing combines. 2) We should arguably be doing this for all configurations, not just zicond.
|
@llvm/pr-subscribers-backend-risc-v Author: Philip Reames (preames) ChangesThis avoids the need to materialize the difference explicitly, and thus reduces register pressure when the condition val is otherwise unused. @Reviewers
Full diff: https://github.com/llvm/llvm-project/pull/155644.diff 3 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index d1e413b378542..0837f6ba85d7c 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -9287,6 +9287,14 @@ SDValue RISCVTargetLowering::lowerSELECT(SDValue Op, SelectionDAG &DAG) const {
}
}
+ // Use SHL/ADDI to avoid having to materialize a constant in register
+ // TODO: Handle the inverse case when the condition can be cheaply flipped
+ if ((TrueVal - FalseVal).isPowerOf2() && FalseVal.isSignedIntN(12)) {
+ SDValue Log2 = DAG.getConstant((TrueVal - FalseVal).logBase2(), DL, VT);
+ SDValue BitDiff = DAG.getNode(ISD::SHL, DL, VT, CondV, Log2);
+ return DAG.getNode(ISD::ADD, DL, VT, FalseV, BitDiff);
+ }
+
auto getCost = [&](const APInt &Delta, const APInt &Addend) {
const int DeltaCost = RISCVMatInt::getIntMatCost(
Delta, Subtarget.getXLen(), Subtarget, /*CompressionCost=*/true);
diff --git a/llvm/test/CodeGen/RISCV/select-const.ll b/llvm/test/CodeGen/RISCV/select-const.ll
index b734af5002403..e838710878d68 100644
--- a/llvm/test/CodeGen/RISCV/select-const.ll
+++ b/llvm/test/CodeGen/RISCV/select-const.ll
@@ -701,8 +701,8 @@ define i32 @diff_shl_addi2(i32 signext %x) {
; RV32ZICOND-LABEL: diff_shl_addi2:
; RV32ZICOND: # %bb.0:
; RV32ZICOND-NEXT: srli a0, a0, 31
-; RV32ZICOND-NEXT: lui a1, 4
-; RV32ZICOND-NEXT: czero.nez a0, a1, a0
+; RV32ZICOND-NEXT: xori a0, a0, 1
+; RV32ZICOND-NEXT: slli a0, a0, 14
; RV32ZICOND-NEXT: addi a0, a0, 25
; RV32ZICOND-NEXT: ret
;
@@ -731,8 +731,8 @@ define i32 @diff_shl_addi2(i32 signext %x) {
; RV64ZICOND-LABEL: diff_shl_addi2:
; RV64ZICOND: # %bb.0:
; RV64ZICOND-NEXT: srli a0, a0, 63
-; RV64ZICOND-NEXT: lui a1, 4
-; RV64ZICOND-NEXT: czero.nez a0, a1, a0
+; RV64ZICOND-NEXT: xori a0, a0, 1
+; RV64ZICOND-NEXT: slli a0, a0, 14
; RV64ZICOND-NEXT: addiw a0, a0, 25
; RV64ZICOND-NEXT: ret
%cmp = icmp sgt i32 %x, -1
diff --git a/llvm/test/CodeGen/RISCV/select.ll b/llvm/test/CodeGen/RISCV/select.ll
index 2e1784d369680..c897dd9368248 100644
--- a/llvm/test/CodeGen/RISCV/select.ll
+++ b/llvm/test/CodeGen/RISCV/select.ll
@@ -1790,15 +1790,13 @@ define i32 @select_cst5_invert(i1 zeroext %cond) {
;
; RV64IMXVTCONDOPS-LABEL: select_cst5_invert:
; RV64IMXVTCONDOPS: # %bb.0:
-; RV64IMXVTCONDOPS-NEXT: li a1, 2
-; RV64IMXVTCONDOPS-NEXT: vt.maskc a0, a1, a0
+; RV64IMXVTCONDOPS-NEXT: slli a0, a0, 1
; RV64IMXVTCONDOPS-NEXT: addi a0, a0, 2047
; RV64IMXVTCONDOPS-NEXT: ret
;
; CHECKZICOND-LABEL: select_cst5_invert:
; CHECKZICOND: # %bb.0:
-; CHECKZICOND-NEXT: li a1, 2
-; CHECKZICOND-NEXT: czero.eqz a0, a1, a0
+; CHECKZICOND-NEXT: slli a0, a0, 1
; CHECKZICOND-NEXT: addi a0, a0, 2047
; CHECKZICOND-NEXT: ret
%ret = select i1 %cond, i32 2049, i32 2047
@@ -1873,17 +1871,21 @@ define i32 @select_cst_diff2_invert(i1 zeroext %cond) {
;
; RV64IMXVTCONDOPS-LABEL: select_cst_diff2_invert:
; RV64IMXVTCONDOPS: # %bb.0:
-; RV64IMXVTCONDOPS-NEXT: li a1, -2
-; RV64IMXVTCONDOPS-NEXT: vt.maskcn a0, a1, a0
-; RV64IMXVTCONDOPS-NEXT: addi a0, a0, 122
+; RV64IMXVTCONDOPS-NEXT: slli a0, a0, 1
+; RV64IMXVTCONDOPS-NEXT: addiw a0, a0, 120
; RV64IMXVTCONDOPS-NEXT: ret
;
-; CHECKZICOND-LABEL: select_cst_diff2_invert:
-; CHECKZICOND: # %bb.0:
-; CHECKZICOND-NEXT: li a1, -2
-; CHECKZICOND-NEXT: czero.nez a0, a1, a0
-; CHECKZICOND-NEXT: addi a0, a0, 122
-; CHECKZICOND-NEXT: ret
+; RV32IMZICOND-LABEL: select_cst_diff2_invert:
+; RV32IMZICOND: # %bb.0:
+; RV32IMZICOND-NEXT: slli a0, a0, 1
+; RV32IMZICOND-NEXT: addi a0, a0, 120
+; RV32IMZICOND-NEXT: ret
+;
+; RV64IMZICOND-LABEL: select_cst_diff2_invert:
+; RV64IMZICOND: # %bb.0:
+; RV64IMZICOND-NEXT: slli a0, a0, 1
+; RV64IMZICOND-NEXT: addiw a0, a0, 120
+; RV64IMZICOND-NEXT: ret
%ret = select i1 %cond, i32 122, i32 120
ret i32 %ret
}
@@ -1911,16 +1913,14 @@ define i32 @select_cst_diff4(i1 zeroext %cond) {
;
; RV64IMXVTCONDOPS-LABEL: select_cst_diff4:
; RV64IMXVTCONDOPS: # %bb.0:
-; RV64IMXVTCONDOPS-NEXT: li a1, -4
-; RV64IMXVTCONDOPS-NEXT: vt.maskcn a0, a1, a0
-; RV64IMXVTCONDOPS-NEXT: addi a0, a0, 10
+; RV64IMXVTCONDOPS-NEXT: slli a0, a0, 2
+; RV64IMXVTCONDOPS-NEXT: addi a0, a0, 6
; RV64IMXVTCONDOPS-NEXT: ret
;
; CHECKZICOND-LABEL: select_cst_diff4:
; CHECKZICOND: # %bb.0:
-; CHECKZICOND-NEXT: li a1, -4
-; CHECKZICOND-NEXT: czero.nez a0, a1, a0
-; CHECKZICOND-NEXT: addi a0, a0, 10
+; CHECKZICOND-NEXT: slli a0, a0, 2
+; CHECKZICOND-NEXT: addi a0, a0, 6
; CHECKZICOND-NEXT: ret
%ret = select i1 %cond, i32 10, i32 6
ret i32 %ret
@@ -1987,17 +1987,21 @@ define i32 @select_cst_diff8(i1 zeroext %cond) {
;
; RV64IMXVTCONDOPS-LABEL: select_cst_diff8:
; RV64IMXVTCONDOPS: # %bb.0:
-; RV64IMXVTCONDOPS-NEXT: li a1, -8
-; RV64IMXVTCONDOPS-NEXT: vt.maskcn a0, a1, a0
-; RV64IMXVTCONDOPS-NEXT: addi a0, a0, 14
+; RV64IMXVTCONDOPS-NEXT: slli a0, a0, 3
+; RV64IMXVTCONDOPS-NEXT: addiw a0, a0, 6
; RV64IMXVTCONDOPS-NEXT: ret
;
-; CHECKZICOND-LABEL: select_cst_diff8:
-; CHECKZICOND: # %bb.0:
-; CHECKZICOND-NEXT: li a1, -8
-; CHECKZICOND-NEXT: czero.nez a0, a1, a0
-; CHECKZICOND-NEXT: addi a0, a0, 14
-; CHECKZICOND-NEXT: ret
+; RV32IMZICOND-LABEL: select_cst_diff8:
+; RV32IMZICOND: # %bb.0:
+; RV32IMZICOND-NEXT: slli a0, a0, 3
+; RV32IMZICOND-NEXT: addi a0, a0, 6
+; RV32IMZICOND-NEXT: ret
+;
+; RV64IMZICOND-LABEL: select_cst_diff8:
+; RV64IMZICOND: # %bb.0:
+; RV64IMZICOND-NEXT: slli a0, a0, 3
+; RV64IMZICOND-NEXT: addiw a0, a0, 6
+; RV64IMZICOND-NEXT: ret
%ret = select i1 %cond, i32 14, i32 6
ret i32 %ret
}
@@ -2071,17 +2075,21 @@ define i32 @select_cst_diff1024(i1 zeroext %cond) {
;
; RV64IMXVTCONDOPS-LABEL: select_cst_diff1024:
; RV64IMXVTCONDOPS: # %bb.0:
-; RV64IMXVTCONDOPS-NEXT: li a1, -1024
-; RV64IMXVTCONDOPS-NEXT: vt.maskcn a0, a1, a0
-; RV64IMXVTCONDOPS-NEXT: addi a0, a0, 1030
+; RV64IMXVTCONDOPS-NEXT: slli a0, a0, 10
+; RV64IMXVTCONDOPS-NEXT: addiw a0, a0, 6
; RV64IMXVTCONDOPS-NEXT: ret
;
-; CHECKZICOND-LABEL: select_cst_diff1024:
-; CHECKZICOND: # %bb.0:
-; CHECKZICOND-NEXT: li a1, -1024
-; CHECKZICOND-NEXT: czero.nez a0, a1, a0
-; CHECKZICOND-NEXT: addi a0, a0, 1030
-; CHECKZICOND-NEXT: ret
+; RV32IMZICOND-LABEL: select_cst_diff1024:
+; RV32IMZICOND: # %bb.0:
+; RV32IMZICOND-NEXT: slli a0, a0, 10
+; RV32IMZICOND-NEXT: addi a0, a0, 6
+; RV32IMZICOND-NEXT: ret
+;
+; RV64IMZICOND-LABEL: select_cst_diff1024:
+; RV64IMZICOND: # %bb.0:
+; RV64IMZICOND-NEXT: slli a0, a0, 10
+; RV64IMZICOND-NEXT: addiw a0, a0, 6
+; RV64IMZICOND-NEXT: ret
%ret = select i1 %cond, i32 1030, i32 6
ret i32 %ret
}
|
topperc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This case is the inverse of the one introuced in llvm#155644. The complexity with the inversion is that we need to also invert the condition before shifting it. I had originally planned to only do so when the condition was "cheaply" invertible (i.e. didn't require the xori), but when looking more closely at the diffs I noticed that while the XORI prevents this from being an icount improvement, and actually lengthens slightly the critical path, it does still reduce the number of registers needed.
…155845) This case is the inverse of the one introduced in #155644. The complexity with the inversion is that we need to also invert the condition before shifting it. I had originally planned to only do so when the condition was "cheaply" invertible (i.e. didn't require the xori), but when looking more closely at the diffs I noticed that while the XORI prevents this from being an icount improvement, and actually lengthens slightly the critical path, it does still reduce the number of registers needed.
This avoids the need to materialize the difference explicitly, and thus reduces register pressure when the condition val is otherwise unused.
@reviewers
This is deliberately subset to be simple. In followups, I'm going to handle the swapped case, but that's a bit more complex and might best be done by refactoring some of the existing combines.
We should arguably be doing this for all configurations, not just zicond.