-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[RISCV] Select and/or/xor with certain constants to Zbb ANDN/ORN/XNOR #120221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'd like to implement an optimization so that long orlow(long x) {
return x | ((1L << 24) - 1);
} that is now compiled into:
gets instead compiled to:
and similarly for AND and XOR. I'm trying to implement this with tablegen patterns, but I'm new to that technology and I got stuck with the following error:
Please let me know how can I fix it or if perhaps this approach is completely wrong. |
Here's a rough sketch using a ComplexPattern instead of PatLeaf/SDNodeXForm. This lets us reuse
|
Here's another case that might be useful to optimize https://godbolt.org/z/MEzP15sas it already generates a |
Force pushed because of a precommit test. |
@@ -21,8 +21,7 @@ define i1 @pr84653(i32 %x) { | |||
; CHECK-ZBB: # %bb.0: | |||
; CHECK-ZBB-NEXT: sext.w a1, a0 | |||
; CHECK-ZBB-NEXT: lui a2, 524288 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lui
is unchanged here because it's a X ^ ((1 << 31) - 1) -> X ^ ~(1 << 31)
case.
@@ -273,37 +240,21 @@ define i32 @compl(i32 %x) { | |||
} | |||
|
|||
define i32 @orlow12(i32 %x) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With Zbs this is:
ori a0, a0, 2047
bseti a0, a0, 11
and is not affected by this change. Shall I add +zbs
or test both with and without?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add RUN lines with +zbs
, rather than adding +zbs
to the existing RUN lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
This works great! I didn't even have to special-case |
@llvm/pr-subscribers-backend-risc-v Author: Piotr Fusik (pfusik) Changes
Saves an Full diff: https://github.com/llvm/llvm-project/pull/120221.diff 5 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
index ccf34b8a6b2b02..d77e2b1421b136 100644
--- a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
@@ -3236,6 +3236,18 @@ bool RISCVDAGToDAGISel::selectSHXADD_UWOp(SDValue N, unsigned ShAmt,
return false;
}
+bool RISCVDAGToDAGISel::selectSImm32fff(SDValue N, SDValue &Val) {
+ if (!isa<ConstantSDNode>(N))
+ return false;
+
+ int64_t Imm = cast<ConstantSDNode>(N)->getSExtValue();
+ if (!(isInt<32>(Imm) && (Imm & 0xfff) == 0xfff && Imm != -1))
+ return false;
+
+ Val = selectImm(CurDAG, SDLoc(N), N->getSimpleValueType(0), ~Imm, *Subtarget);
+ return true;
+}
+
static bool vectorPseudoHasAllNBitUsers(SDNode *User, unsigned UserOpNo,
unsigned Bits,
const TargetInstrInfo *TII) {
diff --git a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h
index 2e738d8d25a6dc..cfe07277fd9ddf 100644
--- a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h
+++ b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h
@@ -119,6 +119,8 @@ class RISCVDAGToDAGISel : public SelectionDAGISel {
return selectSHXADD_UWOp(N, ShAmt, Val);
}
+ bool selectSImm32fff(SDValue N, SDValue &Val);
+
bool hasAllNBitUsers(SDNode *Node, unsigned Bits,
const unsigned Depth = 0) const;
bool hasAllBUsers(SDNode *Node) const { return hasAllNBitUsers(Node, 8); }
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoZb.td b/llvm/lib/Target/RISCV/RISCVInstrInfoZb.td
index a78091cd02a35f..e2050d100e4f88 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoZb.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoZb.td
@@ -475,10 +475,16 @@ def : InstAlias<"zext.h $rd, $rs", (PACKW GPR:$rd, GPR:$rs, X0)>;
// Codegen patterns
//===----------------------------------------------------------------------===//
+def simm32fff : ComplexPattern<XLenVT, 1, "selectSImm32fff", [], [], 0>;
+
let Predicates = [HasStdExtZbbOrZbkb] in {
def : Pat<(XLenVT (and GPR:$rs1, (not GPR:$rs2))), (ANDN GPR:$rs1, GPR:$rs2)>;
def : Pat<(XLenVT (or GPR:$rs1, (not GPR:$rs2))), (ORN GPR:$rs1, GPR:$rs2)>;
def : Pat<(XLenVT (xor GPR:$rs1, (not GPR:$rs2))), (XNOR GPR:$rs1, GPR:$rs2)>;
+
+def : Pat<(XLenVT (and GPR:$rs1, simm32fff:$rs2)), (ANDN GPR:$rs1, simm32fff:$rs2)>;
+def : Pat<(XLenVT (or GPR:$rs1, simm32fff:$rs2)), (ORN GPR:$rs1, simm32fff:$rs2)>;
+def : Pat<(XLenVT (xor GPR:$rs1, simm32fff:$rs2)), (XNOR GPR:$rs1, simm32fff:$rs2)>;
} // Predicates = [HasStdExtZbbOrZbkb]
let Predicates = [HasStdExtZbbOrZbkb] in {
diff --git a/llvm/test/CodeGen/RISCV/pr84653_pr85190.ll b/llvm/test/CodeGen/RISCV/pr84653_pr85190.ll
index b1bba5fdc92116..30a93557347727 100644
--- a/llvm/test/CodeGen/RISCV/pr84653_pr85190.ll
+++ b/llvm/test/CodeGen/RISCV/pr84653_pr85190.ll
@@ -21,8 +21,7 @@ define i1 @pr84653(i32 %x) {
; CHECK-ZBB: # %bb.0:
; CHECK-ZBB-NEXT: sext.w a1, a0
; CHECK-ZBB-NEXT: lui a2, 524288
-; CHECK-ZBB-NEXT: addi a2, a2, -1
-; CHECK-ZBB-NEXT: xor a0, a0, a2
+; CHECK-ZBB-NEXT: xnor a0, a0, a2
; CHECK-ZBB-NEXT: sext.w a0, a0
; CHECK-ZBB-NEXT: max a0, a0, zero
; CHECK-ZBB-NEXT: slt a0, a0, a1
@@ -82,8 +81,7 @@ define i1 @select_to_or(i32 %x) {
; CHECK-ZBB: # %bb.0:
; CHECK-ZBB-NEXT: sext.w a1, a0
; CHECK-ZBB-NEXT: lui a2, 524288
-; CHECK-ZBB-NEXT: addi a2, a2, -1
-; CHECK-ZBB-NEXT: xor a0, a0, a2
+; CHECK-ZBB-NEXT: xnor a0, a0, a2
; CHECK-ZBB-NEXT: sext.w a0, a0
; CHECK-ZBB-NEXT: min a0, a0, zero
; CHECK-ZBB-NEXT: slt a0, a0, a1
diff --git a/llvm/test/CodeGen/RISCV/zbb-logic-neg-imm.ll b/llvm/test/CodeGen/RISCV/zbb-logic-neg-imm.ll
new file mode 100644
index 00000000000000..86c9676bec1280
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/zbb-logic-neg-imm.ll
@@ -0,0 +1,260 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=riscv32 -mattr=+zbb -verify-machineinstrs < %s \
+; RUN: | FileCheck %s --check-prefixes=CHECK,RV32
+; RUN: llc -mtriple=riscv64 -mattr=+zbb -verify-machineinstrs < %s \
+; RUN: | FileCheck %s --check-prefixes=CHECK,RV64
+
+define i32 @and0xabcdefff(i32 %x) {
+; CHECK-LABEL: and0xabcdefff:
+; CHECK: # %bb.0:
+; CHECK-NEXT: lui a1, 344865
+; CHECK-NEXT: andn a0, a0, a1
+; CHECK-NEXT: ret
+ %and = and i32 %x, -1412567041
+ ret i32 %and
+}
+
+define i32 @orlow13(i32 %x) {
+; CHECK-LABEL: orlow13:
+; CHECK: # %bb.0:
+; CHECK-NEXT: lui a1, 1048574
+; CHECK-NEXT: orn a0, a0, a1
+; CHECK-NEXT: ret
+ %or = or i32 %x, 8191
+ ret i32 %or
+}
+
+define i64 @orlow24(i64 %x) {
+; RV32-LABEL: orlow24:
+; RV32: # %bb.0:
+; RV32-NEXT: lui a2, 1044480
+; RV32-NEXT: orn a0, a0, a2
+; RV32-NEXT: ret
+;
+; RV64-LABEL: orlow24:
+; RV64: # %bb.0:
+; RV64-NEXT: lui a1, 1044480
+; RV64-NEXT: orn a0, a0, a1
+; RV64-NEXT: ret
+ %or = or i64 %x, 16777215
+ ret i64 %or
+}
+
+define i32 @xorlow16(i32 %x) {
+; CHECK-LABEL: xorlow16:
+; CHECK: # %bb.0:
+; CHECK-NEXT: lui a1, 1048560
+; CHECK-NEXT: xnor a0, a0, a1
+; CHECK-NEXT: ret
+ %xor = xor i32 %x, 65535
+ ret i32 %xor
+}
+
+define i32 @xorlow31(i32 %x) {
+; CHECK-LABEL: xorlow31:
+; CHECK: # %bb.0:
+; CHECK-NEXT: lui a1, 524288
+; CHECK-NEXT: xnor a0, a0, a1
+; CHECK-NEXT: ret
+ %xor = xor i32 %x, 2147483647
+ ret i32 %xor
+}
+
+define i32 @oraddlow16(i32 %x) {
+; RV32-LABEL: oraddlow16:
+; RV32: # %bb.0:
+; RV32-NEXT: lui a1, 1048560
+; RV32-NEXT: orn a0, a0, a1
+; RV32-NEXT: lui a1, 16
+; RV32-NEXT: addi a1, a1, -1
+; RV32-NEXT: add a0, a0, a1
+; RV32-NEXT: ret
+;
+; RV64-LABEL: oraddlow16:
+; RV64: # %bb.0:
+; RV64-NEXT: lui a1, 1048560
+; RV64-NEXT: orn a0, a0, a1
+; RV64-NEXT: lui a1, 16
+; RV64-NEXT: addi a1, a1, -1
+; RV64-NEXT: addw a0, a0, a1
+; RV64-NEXT: ret
+ %or = or i32 %x, 65535
+ %add = add nsw i32 %or, 65535
+ ret i32 %add
+}
+
+define i32 @addorlow16(i32 %x) {
+; RV32-LABEL: addorlow16:
+; RV32: # %bb.0:
+; RV32-NEXT: lui a1, 16
+; RV32-NEXT: addi a1, a1, -1
+; RV32-NEXT: add a0, a0, a1
+; RV32-NEXT: lui a1, 1048560
+; RV32-NEXT: orn a0, a0, a1
+; RV32-NEXT: ret
+;
+; RV64-LABEL: addorlow16:
+; RV64: # %bb.0:
+; RV64-NEXT: lui a1, 16
+; RV64-NEXT: addi a1, a1, -1
+; RV64-NEXT: addw a0, a0, a1
+; RV64-NEXT: lui a1, 1048560
+; RV64-NEXT: orn a0, a0, a1
+; RV64-NEXT: ret
+ %add = add nsw i32 %x, 65535
+ %or = or i32 %add, 65535
+ ret i32 %or
+}
+
+define i32 @andxorlow16(i32 %x) {
+; RV32-LABEL: andxorlow16:
+; RV32: # %bb.0:
+; RV32-NEXT: lui a1, 16
+; RV32-NEXT: addi a1, a1, -1
+; RV32-NEXT: andn a0, a1, a0
+; RV32-NEXT: ret
+;
+; RV64-LABEL: andxorlow16:
+; RV64: # %bb.0:
+; RV64-NEXT: lui a1, 16
+; RV64-NEXT: addiw a1, a1, -1
+; RV64-NEXT: andn a0, a1, a0
+; RV64-NEXT: ret
+ %and = and i32 %x, 65535
+ %xor = xor i32 %and, 65535
+ ret i32 %xor
+}
+
+define void @orarray100(ptr %a) {
+; RV32-LABEL: orarray100:
+; RV32: # %bb.0: # %entry
+; RV32-NEXT: li a1, 0
+; RV32-NEXT: li a2, 0
+; RV32-NEXT: lui a3, 1048560
+; RV32-NEXT: .LBB8_1: # %for.body
+; RV32-NEXT: # =>This Inner Loop Header: Depth=1
+; RV32-NEXT: slli a4, a1, 2
+; RV32-NEXT: addi a1, a1, 1
+; RV32-NEXT: add a4, a0, a4
+; RV32-NEXT: lw a5, 0(a4)
+; RV32-NEXT: seqz a6, a1
+; RV32-NEXT: add a2, a2, a6
+; RV32-NEXT: xori a6, a1, 100
+; RV32-NEXT: orn a5, a5, a3
+; RV32-NEXT: or a6, a6, a2
+; RV32-NEXT: sw a5, 0(a4)
+; RV32-NEXT: bnez a6, .LBB8_1
+; RV32-NEXT: # %bb.2: # %for.cond.cleanup
+; RV32-NEXT: ret
+;
+; RV64-LABEL: orarray100:
+; RV64: # %bb.0: # %entry
+; RV64-NEXT: addi a1, a0, 400
+; RV64-NEXT: lui a2, 1048560
+; RV64-NEXT: .LBB8_1: # %for.body
+; RV64-NEXT: # =>This Inner Loop Header: Depth=1
+; RV64-NEXT: lw a3, 0(a0)
+; RV64-NEXT: orn a3, a3, a2
+; RV64-NEXT: sw a3, 0(a0)
+; RV64-NEXT: addi a0, a0, 4
+; RV64-NEXT: bne a0, a1, .LBB8_1
+; RV64-NEXT: # %bb.2: # %for.cond.cleanup
+; RV64-NEXT: ret
+entry:
+ br label %for.body
+
+for.cond.cleanup:
+ ret void
+
+for.body:
+ %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
+ %arrayidx = getelementptr inbounds nuw i32, ptr %a, i64 %indvars.iv
+ %1 = load i32, ptr %arrayidx, align 4
+ %or = or i32 %1, 65535
+ store i32 %or, ptr %arrayidx, align 4
+ %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+ %exitcond.not = icmp eq i64 %indvars.iv.next, 100
+ br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+}
+
+define void @orarray3(ptr %a) {
+; CHECK-LABEL: orarray3:
+; CHECK: # %bb.0:
+; CHECK-NEXT: lw a1, 0(a0)
+; CHECK-NEXT: lw a2, 4(a0)
+; CHECK-NEXT: lw a3, 8(a0)
+; CHECK-NEXT: lui a4, 1048560
+; CHECK-NEXT: orn a1, a1, a4
+; CHECK-NEXT: orn a2, a2, a4
+; CHECK-NEXT: orn a3, a3, a4
+; CHECK-NEXT: sw a1, 0(a0)
+; CHECK-NEXT: sw a2, 4(a0)
+; CHECK-NEXT: sw a3, 8(a0)
+; CHECK-NEXT: ret
+ %1 = load i32, ptr %a, align 4
+ %or = or i32 %1, 65535
+ store i32 %or, ptr %a, align 4
+ %arrayidx.1 = getelementptr inbounds nuw i8, ptr %a, i64 4
+ %2 = load i32, ptr %arrayidx.1, align 4
+ %or.1 = or i32 %2, 65535
+ store i32 %or.1, ptr %arrayidx.1, align 4
+ %arrayidx.2 = getelementptr inbounds nuw i8, ptr %a, i64 8
+ %3 = load i32, ptr %arrayidx.2, align 4
+ %or.2 = or i32 %3, 65535
+ store i32 %or.2, ptr %arrayidx.2, align 4
+ ret void
+}
+
+define i32 @andlow16(i32 %x) {
+; CHECK-LABEL: andlow16:
+; CHECK: # %bb.0:
+; CHECK-NEXT: zext.h a0, a0
+; CHECK-NEXT: ret
+ %and = and i32 %x, 65535
+ ret i32 %and
+}
+
+define i32 @andlow24(i32 %x) {
+; RV32-LABEL: andlow24:
+; RV32: # %bb.0:
+; RV32-NEXT: slli a0, a0, 8
+; RV32-NEXT: srli a0, a0, 8
+; RV32-NEXT: ret
+;
+; RV64-LABEL: andlow24:
+; RV64: # %bb.0:
+; RV64-NEXT: slli a0, a0, 40
+; RV64-NEXT: srli a0, a0, 40
+; RV64-NEXT: ret
+ %and = and i32 %x, 16777215
+ ret i32 %and
+}
+
+define i32 @compl(i32 %x) {
+; CHECK-LABEL: compl:
+; CHECK: # %bb.0:
+; CHECK-NEXT: not a0, a0
+; CHECK-NEXT: ret
+ %not = xor i32 %x, -1
+ ret i32 %not
+}
+
+define i32 @orlow12(i32 %x) {
+; CHECK-LABEL: orlow12:
+; CHECK: # %bb.0:
+; CHECK-NEXT: lui a1, 1048575
+; CHECK-NEXT: orn a0, a0, a1
+; CHECK-NEXT: ret
+ %or = or i32 %x, 4095
+ ret i32 %or
+}
+
+define i32 @xorlow12(i32 %x) {
+; CHECK-LABEL: xorlow12:
+; CHECK: # %bb.0:
+; CHECK-NEXT: lui a1, 1048575
+; CHECK-NEXT: xnor a0, a0, a1
+; CHECK-NEXT: ret
+ %xor = xor i32 %x, 4095
+ ret i32 %xor
+}
|
I can work on it. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
return false; | ||
|
||
int64_t Imm = cast<ConstantSDNode>(N)->getSExtValue(); | ||
if (!(isInt<32>(Imm) && (Imm & 0xfff) == 0xfff && Imm != -1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Push the !
into the individual conditions and use ||
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added Zbs testing. Added andimm64
test. Squashed the change so that the effect on tests is clear.
; RV64-NEXT: lui a1, 65281 | ||
; RV64-NEXT: slli a1, a1, 4 | ||
; RV64-NEXT: addi a1, a1, -1 | ||
; RV64-NEXT: and a0, a0, a1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it's possible to optimize a non-int32. Getting the condition right in general seems tricky. Can I call RISCVMatInt
for Imm
and ~Imm
to see which one is shorter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wouldn't be the only place to compare two getIntMatCost
s, so it should be fine, as long as you check the simpler exclusionary conditions first - maybe even including your isBitwiseLogicOp
loop.
Done. Covered by the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this.
By default, LLVM will squash-merge, so can you pre-commit the new test file before landing this?
Do you mean create a new PR for the pre-commit test? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
if ((Imm & 0xfff) != 0xfff || Imm == -1) | ||
return false; | ||
|
||
for (const SDNode *U : N->uses()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to change N->users()
now. See e6b2495
If you have commit access you can directly push it without going through github. |
(and X, (C<<12|0xfff)) -> (ANDN X, ~C<<12) (or X, (C<<12|0xfff)) -> (ORN X, ~C<<12) (xor X, (C<<12|0xfff)) -> (XNOR X, ~C<<12) Emits better code, typically by avoiding an `ADDI HI, -1` instruction. Co-authored-by: Craig Topper <[email protected]>
Nice work! |
Thank you. |
This extends PR llvm#120221 to 64-bit constants that don't match the 12-low-bits-set pattern.
This extends PR llvm#120221 to 64-bit constants that don't match the 12-low-bits-set pattern.
This extends PR llvm#120221 to 64-bit constants that don't match the 12-low-bits-set pattern.
This extends PR #120221 to 64-bit constants that don't match the 12-low-bits-set pattern.
This extends PR llvm#120221 to vector instructions.
This extends PR llvm#120221 to vector instructions.
This extends PR llvm#120221 to vector instructions.
RV64 only. For 32-bit constants, a negated constant is never cheaper. This change is similar to how llvm#120221 selects inverted bitwise instructions.
RV64 only. For 32-bit constants, a negated constant is never cheaper. This change is similar to how llvm#120221 selects inverted bitwise instructions.
…llvm#137309) RV64 only. For 32-bit constants, a negated constant is never cheaper. This change is similar to how llvm#120221 selects inverted bitwise instructions.
…llvm#137309) RV64 only. For 32-bit constants, a negated constant is never cheaper. This change is similar to how llvm#120221 selects inverted bitwise instructions.
…llvm#137309) RV64 only. For 32-bit constants, a negated constant is never cheaper. This change is similar to how llvm#120221 selects inverted bitwise instructions.
…llvm#137309) RV64 only. For 32-bit constants, a negated constant is never cheaper. This change is similar to how llvm#120221 selects inverted bitwise instructions.
…llvm#137309) RV64 only. For 32-bit constants, a negated constant is never cheaper. This change is similar to how llvm#120221 selects inverted bitwise instructions.
Emits better code, typically by avoiding an
ADDI HI, -1
instruction.Co-authored-by: Craig Topper [email protected]