Skip to content

[RISCV] Select and/or/xor with certain constants to Zbb ANDN/ORN/XNOR #120221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 19, 2024

Conversation

pfusik
Copy link
Contributor

@pfusik pfusik commented Dec 17, 2024

(and X, (C<<12|0xfff)) -> (ANDN X, ~C<<12)
(or  X, (C<<12|0xfff)) -> (ORN  X, ~C<<12)
(xor X, (C<<12|0xfff)) -> (XNOR X, ~C<<12)

Emits better code, typically by avoiding an ADDI HI, -1 instruction.

Co-authored-by: Craig Topper [email protected]

@pfusik
Copy link
Contributor Author

pfusik commented Dec 17, 2024

I'd like to implement an optimization so that

long orlow(long x) {
    return x | ((1L << 24) - 1);
}

that is now compiled into:

        lui     a1, 4096
        addiw   a1, a1, -1
        or      a0, a0, a1
        ret

gets instead compiled to:

        lui     a1, -4096
        orn     a0, a0, a1
        ret

and similarly for AND and XOR.

I'm trying to implement this with tablegen patterns, but I'm new to that technology and I got stuck with the following error:

[16/130 -j3] Building RISCVGenGlobalISel.inc...
FAILED: lib/Target/RISCV/RISCVGenGlobalISel.inc /home/p.fusik/upstream/llvm-project/build/lib/Target/RISCV/RISCVGenGlobalISel.inc
cd /home/p.fusik/upstream/llvm-project/build && /home/p.fusik/upstream/llvm-project/build/bin/llvm-tblgen -gen-global-isel -I /home/p.fusik/upstream/llvm-project/llvm/lib/Target/RISCV -I/home/p.fusik/upstream/llvm-project/build/include -I/home/p.fusik/upstream/llvm-project/llvm/include -I /home/p.fusik/upstream/llvm-project/llvm/lib/Target /home/p.fusik/upstream/llvm-project/llvm/lib/Target/RISCV/RISCVGISel.td --write-if-changed -o lib/Target/RISCV/RISCVGenGlobalISel.inc -d lib/Target/RISCV/RISCVGenGlobalISel.inc.d
anonymous_10632:        (or:{ *:[i32] m1:[i64] } (vt:{} GPR:{ *:[i16 i32] m1:[i16 i32 i64 f64] }:$rs1), (imm:{ *:[i32] m1:[i64] })<<P:Predicate_simm32fff>>:$imm)
Included from /home/p.fusik/upstream/llvm-project/llvm/lib/Target/RISCV/RISCVGISel.td:16:
Included from /home/p.fusik/upstream/llvm-project/llvm/lib/Target/RISCV/RISCV.td:36:
Included from /home/p.fusik/upstream/llvm-project/llvm/lib/Target/RISCV/RISCVInstrInfo.td:2113:
/home/p.fusik/upstream/llvm-project/llvm/lib/Target/RISCV/RISCVInstrInfoZb.td:499:1: error: In anonymous_10632: vt node requires exactly 0 operands!
def : PatGprSimm32fff<or, ORN>;
^
anonymous_10632:        (or:{ *:[i32] m1:[i64] } (vt:{} GPR:{ *:[i16 i32] m1:[i16 i32 i64 f64] }:$rs1), (imm:{ *:[i32] m1:[i64] })<<P:Predicate_simm32fff>>:$imm)
anonymous_10632:        (ORN:{ *:[i32] m1:[i64] } GPR:{ *:[i16 i32] m1:[i16 i32 i64 f64] }:$rs1, (NotImm:{ *:[i16 i32] m1:[i16 i32 i64 f64] } (imm:{ *:[i32] m1:[i64] })<<P:Predicate_simm32fff>>:$imm))
Included from /home/p.fusik/upstream/llvm-project/llvm/lib/Target/RISCV/RISCVGISel.td:16:
Included from /home/p.fusik/upstream/llvm-project/llvm/lib/Target/RISCV/RISCV.td:36:
Included from /home/p.fusik/upstream/llvm-project/llvm/lib/Target/RISCV/RISCVInstrInfo.td:2113:
/home/p.fusik/upstream/llvm-project/llvm/lib/Target/RISCV/RISCVInstrInfoZb.td:499:1: error: In anonymous_10632: Could not infer all types in pattern result!
def : PatGprSimm32fff<or, ORN>;
^

Please let me know how can I fix it or if perhaps this approach is completely wrong.

@pfusik pfusik changed the title [RISCV] Select and/or/xor with some constants to Zbb ANDN/ORN/XNOR [RISCV] Select and/or/xor with certain constants to Zbb ANDN/ORN/XNOR Dec 17, 2024
@pfusik pfusik requested review from mshockwave and topperc December 17, 2024 12:18
@topperc
Copy link
Collaborator

topperc commented Dec 17, 2024

Here's a rough sketch using a ComplexPattern instead of PatLeaf/SDNodeXForm. This lets us reuse selectImm to create the LUI.

diff --git a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
index ccf34b8a6b2b..ae04b705a0f4 100644
--- a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
@@ -3236,6 +3236,18 @@ bool RISCVDAGToDAGISel::selectSHXADD_UWOp(SDValue N, unsigned ShAmt,
   return false;
 }
 
+bool RISCVDAGToDAGISel::selectSImm32ffff(SDValue N, SDValue &Val) {
+  if (!isa<ConstantSDNode>(N))
+    return false;
+
+  int64_t Imm = cast<ConstantSDNode>(N)->getSExtValue();
+  if (!(isInt<32>(Imm) && (Imm & 0xfff) == 0xfff && Imm != -1))
+    return false;
+
+  Val = selectImm(CurDAG, SDLoc(N), N->getSimpleValueType(0), ~Imm, *Subtarget);
+  return true;
+}
+
 static bool vectorPseudoHasAllNBitUsers(SDNode *User, unsigned UserOpNo,
                                         unsigned Bits,
                                         const TargetInstrInfo *TII) {
diff --git a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h
index 2e738d8d25a6..410ae4ef8c3c 100644
--- a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h
+++ b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h
@@ -119,6 +119,8 @@ public:
     return selectSHXADD_UWOp(N, ShAmt, Val);
   }
 
+  bool selectSImm32ffff(SDValue N, SDValue &Val);
+
   bool hasAllNBitUsers(SDNode *Node, unsigned Bits,
                        const unsigned Depth = 0) const;
   bool hasAllBUsers(SDNode *Node) const { return hasAllNBitUsers(Node, 8); }
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoZb.td b/llvm/lib/Target/RISCV/RISCVInstrInfoZb.td
index a78091cd02a3..ff9bdea6252b 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoZb.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoZb.td
@@ -475,10 +475,15 @@ def : InstAlias<"zext.h $rd, $rs", (PACKW GPR:$rd, GPR:$rs, X0)>;
 // Codegen patterns
 //===----------------------------------------------------------------------===//
 
+def simm32fff : ComplexPattern<XLenVT, 1, "selectSImm32ffff", [], [], 0>;
+
 let Predicates = [HasStdExtZbbOrZbkb] in {
 def : Pat<(XLenVT (and GPR:$rs1, (not GPR:$rs2))), (ANDN GPR:$rs1, GPR:$rs2)>;
 def : Pat<(XLenVT (or  GPR:$rs1, (not GPR:$rs2))), (ORN  GPR:$rs1, GPR:$rs2)>;
 def : Pat<(XLenVT (xor GPR:$rs1, (not GPR:$rs2))), (XNOR GPR:$rs1, GPR:$rs2)>;
+
+def : Pat<(XLenVT (or GPR:$rs1, simm32fff:$rs2)), (ORN GPR:$rs1, simm32fff:$rs2)>;
+
 } // Predicates = [HasStdExtZbbOrZbkb]
 
 let Predicates = [HasStdExtZbbOrZbkb] in {

@topperc
Copy link
Collaborator

topperc commented Dec 17, 2024

Here's another case that might be useful to optimize https://godbolt.org/z/MEzP15sas it already generates a not but we don't fold it.

@pfusik
Copy link
Contributor Author

pfusik commented Dec 18, 2024

Force pushed because of a precommit test.

@@ -21,8 +21,7 @@ define i1 @pr84653(i32 %x) {
; CHECK-ZBB: # %bb.0:
; CHECK-ZBB-NEXT: sext.w a1, a0
; CHECK-ZBB-NEXT: lui a2, 524288
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lui is unchanged here because it's a X ^ ((1 << 31) - 1) -> X ^ ~(1 << 31) case.

@@ -273,37 +240,21 @@ define i32 @compl(i32 %x) {
}

define i32 @orlow12(i32 %x) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Zbs this is:

    ori     a0, a0, 2047
    bseti   a0, a0, 11

and is not affected by this change. Shall I add +zbs or test both with and without?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add RUN lines with +zbs, rather than adding +zbs to the existing RUN lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@pfusik
Copy link
Contributor Author

pfusik commented Dec 18, 2024

Here's a rough sketch using a ComplexPattern instead of PatLeaf/SDNodeXForm.

This works great!

I didn't even have to special-case and for sll+srl etc. But I don't know why. Does the pattern order matter or is there some scoring mechanism?

@pfusik pfusik marked this pull request as ready for review December 18, 2024 14:05
@llvmbot
Copy link
Member

llvmbot commented Dec 18, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Piotr Fusik (pfusik)

Changes
(and X, (C&lt;&lt;12|0xfff)) -&gt; (ANDN X, ~C&lt;&lt;12)
(or  X, (C&lt;&lt;12|0xfff)) -&gt; (ORN  X, ~C&lt;&lt;12)
(xor X, (C&lt;&lt;12|0xfff)) -&gt; (XNOR X, ~C&lt;&lt;12)

Saves an ADDI HI, -1 instruction.


Full diff: https://github.com/llvm/llvm-project/pull/120221.diff

5 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp (+12)
  • (modified) llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h (+2)
  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfoZb.td (+6)
  • (modified) llvm/test/CodeGen/RISCV/pr84653_pr85190.ll (+2-4)
  • (added) llvm/test/CodeGen/RISCV/zbb-logic-neg-imm.ll (+260)
diff --git a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
index ccf34b8a6b2b02..d77e2b1421b136 100644
--- a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
@@ -3236,6 +3236,18 @@ bool RISCVDAGToDAGISel::selectSHXADD_UWOp(SDValue N, unsigned ShAmt,
   return false;
 }
 
+bool RISCVDAGToDAGISel::selectSImm32fff(SDValue N, SDValue &Val) {
+  if (!isa<ConstantSDNode>(N))
+    return false;
+
+  int64_t Imm = cast<ConstantSDNode>(N)->getSExtValue();
+  if (!(isInt<32>(Imm) && (Imm & 0xfff) == 0xfff && Imm != -1))
+    return false;
+
+  Val = selectImm(CurDAG, SDLoc(N), N->getSimpleValueType(0), ~Imm, *Subtarget);
+  return true;
+}
+
 static bool vectorPseudoHasAllNBitUsers(SDNode *User, unsigned UserOpNo,
                                         unsigned Bits,
                                         const TargetInstrInfo *TII) {
diff --git a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h
index 2e738d8d25a6dc..cfe07277fd9ddf 100644
--- a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h
+++ b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h
@@ -119,6 +119,8 @@ class RISCVDAGToDAGISel : public SelectionDAGISel {
     return selectSHXADD_UWOp(N, ShAmt, Val);
   }
 
+  bool selectSImm32fff(SDValue N, SDValue &Val);
+
   bool hasAllNBitUsers(SDNode *Node, unsigned Bits,
                        const unsigned Depth = 0) const;
   bool hasAllBUsers(SDNode *Node) const { return hasAllNBitUsers(Node, 8); }
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoZb.td b/llvm/lib/Target/RISCV/RISCVInstrInfoZb.td
index a78091cd02a35f..e2050d100e4f88 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoZb.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoZb.td
@@ -475,10 +475,16 @@ def : InstAlias<"zext.h $rd, $rs", (PACKW GPR:$rd, GPR:$rs, X0)>;
 // Codegen patterns
 //===----------------------------------------------------------------------===//
 
+def simm32fff : ComplexPattern<XLenVT, 1, "selectSImm32fff", [], [], 0>;
+
 let Predicates = [HasStdExtZbbOrZbkb] in {
 def : Pat<(XLenVT (and GPR:$rs1, (not GPR:$rs2))), (ANDN GPR:$rs1, GPR:$rs2)>;
 def : Pat<(XLenVT (or  GPR:$rs1, (not GPR:$rs2))), (ORN  GPR:$rs1, GPR:$rs2)>;
 def : Pat<(XLenVT (xor GPR:$rs1, (not GPR:$rs2))), (XNOR GPR:$rs1, GPR:$rs2)>;
+
+def : Pat<(XLenVT (and GPR:$rs1, simm32fff:$rs2)), (ANDN GPR:$rs1, simm32fff:$rs2)>;
+def : Pat<(XLenVT (or  GPR:$rs1, simm32fff:$rs2)), (ORN  GPR:$rs1, simm32fff:$rs2)>;
+def : Pat<(XLenVT (xor GPR:$rs1, simm32fff:$rs2)), (XNOR GPR:$rs1, simm32fff:$rs2)>;
 } // Predicates = [HasStdExtZbbOrZbkb]
 
 let Predicates = [HasStdExtZbbOrZbkb] in {
diff --git a/llvm/test/CodeGen/RISCV/pr84653_pr85190.ll b/llvm/test/CodeGen/RISCV/pr84653_pr85190.ll
index b1bba5fdc92116..30a93557347727 100644
--- a/llvm/test/CodeGen/RISCV/pr84653_pr85190.ll
+++ b/llvm/test/CodeGen/RISCV/pr84653_pr85190.ll
@@ -21,8 +21,7 @@ define i1 @pr84653(i32 %x) {
 ; CHECK-ZBB:       # %bb.0:
 ; CHECK-ZBB-NEXT:    sext.w a1, a0
 ; CHECK-ZBB-NEXT:    lui a2, 524288
-; CHECK-ZBB-NEXT:    addi a2, a2, -1
-; CHECK-ZBB-NEXT:    xor a0, a0, a2
+; CHECK-ZBB-NEXT:    xnor a0, a0, a2
 ; CHECK-ZBB-NEXT:    sext.w a0, a0
 ; CHECK-ZBB-NEXT:    max a0, a0, zero
 ; CHECK-ZBB-NEXT:    slt a0, a0, a1
@@ -82,8 +81,7 @@ define i1 @select_to_or(i32 %x) {
 ; CHECK-ZBB:       # %bb.0:
 ; CHECK-ZBB-NEXT:    sext.w a1, a0
 ; CHECK-ZBB-NEXT:    lui a2, 524288
-; CHECK-ZBB-NEXT:    addi a2, a2, -1
-; CHECK-ZBB-NEXT:    xor a0, a0, a2
+; CHECK-ZBB-NEXT:    xnor a0, a0, a2
 ; CHECK-ZBB-NEXT:    sext.w a0, a0
 ; CHECK-ZBB-NEXT:    min a0, a0, zero
 ; CHECK-ZBB-NEXT:    slt a0, a0, a1
diff --git a/llvm/test/CodeGen/RISCV/zbb-logic-neg-imm.ll b/llvm/test/CodeGen/RISCV/zbb-logic-neg-imm.ll
new file mode 100644
index 00000000000000..86c9676bec1280
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/zbb-logic-neg-imm.ll
@@ -0,0 +1,260 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=riscv32 -mattr=+zbb -verify-machineinstrs < %s \
+; RUN:   | FileCheck %s --check-prefixes=CHECK,RV32
+; RUN: llc -mtriple=riscv64 -mattr=+zbb -verify-machineinstrs < %s \
+; RUN:   | FileCheck %s --check-prefixes=CHECK,RV64
+
+define i32 @and0xabcdefff(i32 %x) {
+; CHECK-LABEL: and0xabcdefff:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    lui a1, 344865
+; CHECK-NEXT:    andn a0, a0, a1
+; CHECK-NEXT:    ret
+  %and = and i32 %x, -1412567041
+  ret i32 %and
+}
+
+define i32 @orlow13(i32 %x) {
+; CHECK-LABEL: orlow13:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    lui a1, 1048574
+; CHECK-NEXT:    orn a0, a0, a1
+; CHECK-NEXT:    ret
+  %or = or i32 %x, 8191
+  ret i32 %or
+}
+
+define i64 @orlow24(i64 %x) {
+; RV32-LABEL: orlow24:
+; RV32:       # %bb.0:
+; RV32-NEXT:    lui a2, 1044480
+; RV32-NEXT:    orn a0, a0, a2
+; RV32-NEXT:    ret
+;
+; RV64-LABEL: orlow24:
+; RV64:       # %bb.0:
+; RV64-NEXT:    lui a1, 1044480
+; RV64-NEXT:    orn a0, a0, a1
+; RV64-NEXT:    ret
+  %or = or i64 %x, 16777215
+  ret i64 %or
+}
+
+define i32 @xorlow16(i32 %x) {
+; CHECK-LABEL: xorlow16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    lui a1, 1048560
+; CHECK-NEXT:    xnor a0, a0, a1
+; CHECK-NEXT:    ret
+  %xor = xor i32 %x, 65535
+  ret i32 %xor
+}
+
+define i32 @xorlow31(i32 %x) {
+; CHECK-LABEL: xorlow31:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    lui a1, 524288
+; CHECK-NEXT:    xnor a0, a0, a1
+; CHECK-NEXT:    ret
+  %xor = xor i32 %x, 2147483647
+  ret i32 %xor
+}
+
+define i32 @oraddlow16(i32 %x) {
+; RV32-LABEL: oraddlow16:
+; RV32:       # %bb.0:
+; RV32-NEXT:    lui a1, 1048560
+; RV32-NEXT:    orn a0, a0, a1
+; RV32-NEXT:    lui a1, 16
+; RV32-NEXT:    addi a1, a1, -1
+; RV32-NEXT:    add a0, a0, a1
+; RV32-NEXT:    ret
+;
+; RV64-LABEL: oraddlow16:
+; RV64:       # %bb.0:
+; RV64-NEXT:    lui a1, 1048560
+; RV64-NEXT:    orn a0, a0, a1
+; RV64-NEXT:    lui a1, 16
+; RV64-NEXT:    addi a1, a1, -1
+; RV64-NEXT:    addw a0, a0, a1
+; RV64-NEXT:    ret
+  %or = or i32 %x, 65535
+  %add = add nsw i32 %or, 65535
+  ret i32 %add
+}
+
+define i32 @addorlow16(i32 %x) {
+; RV32-LABEL: addorlow16:
+; RV32:       # %bb.0:
+; RV32-NEXT:    lui a1, 16
+; RV32-NEXT:    addi a1, a1, -1
+; RV32-NEXT:    add a0, a0, a1
+; RV32-NEXT:    lui a1, 1048560
+; RV32-NEXT:    orn a0, a0, a1
+; RV32-NEXT:    ret
+;
+; RV64-LABEL: addorlow16:
+; RV64:       # %bb.0:
+; RV64-NEXT:    lui a1, 16
+; RV64-NEXT:    addi a1, a1, -1
+; RV64-NEXT:    addw a0, a0, a1
+; RV64-NEXT:    lui a1, 1048560
+; RV64-NEXT:    orn a0, a0, a1
+; RV64-NEXT:    ret
+  %add = add nsw i32 %x, 65535
+  %or = or i32 %add, 65535
+  ret i32 %or
+}
+
+define i32 @andxorlow16(i32 %x) {
+; RV32-LABEL: andxorlow16:
+; RV32:       # %bb.0:
+; RV32-NEXT:    lui a1, 16
+; RV32-NEXT:    addi a1, a1, -1
+; RV32-NEXT:    andn a0, a1, a0
+; RV32-NEXT:    ret
+;
+; RV64-LABEL: andxorlow16:
+; RV64:       # %bb.0:
+; RV64-NEXT:    lui a1, 16
+; RV64-NEXT:    addiw a1, a1, -1
+; RV64-NEXT:    andn a0, a1, a0
+; RV64-NEXT:    ret
+  %and = and i32 %x, 65535
+  %xor = xor i32 %and, 65535
+  ret i32 %xor
+}
+
+define void @orarray100(ptr %a) {
+; RV32-LABEL: orarray100:
+; RV32:       # %bb.0: # %entry
+; RV32-NEXT:    li a1, 0
+; RV32-NEXT:    li a2, 0
+; RV32-NEXT:    lui a3, 1048560
+; RV32-NEXT:  .LBB8_1: # %for.body
+; RV32-NEXT:    # =>This Inner Loop Header: Depth=1
+; RV32-NEXT:    slli a4, a1, 2
+; RV32-NEXT:    addi a1, a1, 1
+; RV32-NEXT:    add a4, a0, a4
+; RV32-NEXT:    lw a5, 0(a4)
+; RV32-NEXT:    seqz a6, a1
+; RV32-NEXT:    add a2, a2, a6
+; RV32-NEXT:    xori a6, a1, 100
+; RV32-NEXT:    orn a5, a5, a3
+; RV32-NEXT:    or a6, a6, a2
+; RV32-NEXT:    sw a5, 0(a4)
+; RV32-NEXT:    bnez a6, .LBB8_1
+; RV32-NEXT:  # %bb.2: # %for.cond.cleanup
+; RV32-NEXT:    ret
+;
+; RV64-LABEL: orarray100:
+; RV64:       # %bb.0: # %entry
+; RV64-NEXT:    addi a1, a0, 400
+; RV64-NEXT:    lui a2, 1048560
+; RV64-NEXT:  .LBB8_1: # %for.body
+; RV64-NEXT:    # =>This Inner Loop Header: Depth=1
+; RV64-NEXT:    lw a3, 0(a0)
+; RV64-NEXT:    orn a3, a3, a2
+; RV64-NEXT:    sw a3, 0(a0)
+; RV64-NEXT:    addi a0, a0, 4
+; RV64-NEXT:    bne a0, a1, .LBB8_1
+; RV64-NEXT:  # %bb.2: # %for.cond.cleanup
+; RV64-NEXT:    ret
+entry:
+  br label %for.body
+
+for.cond.cleanup:
+  ret void
+
+for.body:
+  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
+  %arrayidx = getelementptr inbounds nuw i32, ptr %a, i64 %indvars.iv
+  %1 = load i32, ptr %arrayidx, align 4
+  %or = or i32 %1, 65535
+  store i32 %or, ptr %arrayidx, align 4
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+  %exitcond.not = icmp eq i64 %indvars.iv.next, 100
+  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+}
+
+define void @orarray3(ptr %a) {
+; CHECK-LABEL: orarray3:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    lw a1, 0(a0)
+; CHECK-NEXT:    lw a2, 4(a0)
+; CHECK-NEXT:    lw a3, 8(a0)
+; CHECK-NEXT:    lui a4, 1048560
+; CHECK-NEXT:    orn a1, a1, a4
+; CHECK-NEXT:    orn a2, a2, a4
+; CHECK-NEXT:    orn a3, a3, a4
+; CHECK-NEXT:    sw a1, 0(a0)
+; CHECK-NEXT:    sw a2, 4(a0)
+; CHECK-NEXT:    sw a3, 8(a0)
+; CHECK-NEXT:    ret
+  %1 = load i32, ptr %a, align 4
+  %or = or i32 %1, 65535
+  store i32 %or, ptr %a, align 4
+  %arrayidx.1 = getelementptr inbounds nuw i8, ptr %a, i64 4
+  %2 = load i32, ptr %arrayidx.1, align 4
+  %or.1 = or i32 %2, 65535
+  store i32 %or.1, ptr %arrayidx.1, align 4
+  %arrayidx.2 = getelementptr inbounds nuw i8, ptr %a, i64 8
+  %3 = load i32, ptr %arrayidx.2, align 4
+  %or.2 = or i32 %3, 65535
+  store i32 %or.2, ptr %arrayidx.2, align 4
+  ret void
+}
+
+define i32 @andlow16(i32 %x) {
+; CHECK-LABEL: andlow16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    zext.h a0, a0
+; CHECK-NEXT:    ret
+  %and = and i32 %x, 65535
+  ret i32 %and
+}
+
+define i32 @andlow24(i32 %x) {
+; RV32-LABEL: andlow24:
+; RV32:       # %bb.0:
+; RV32-NEXT:    slli a0, a0, 8
+; RV32-NEXT:    srli a0, a0, 8
+; RV32-NEXT:    ret
+;
+; RV64-LABEL: andlow24:
+; RV64:       # %bb.0:
+; RV64-NEXT:    slli a0, a0, 40
+; RV64-NEXT:    srli a0, a0, 40
+; RV64-NEXT:    ret
+  %and = and i32 %x, 16777215
+  ret i32 %and
+}
+
+define i32 @compl(i32 %x) {
+; CHECK-LABEL: compl:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    not a0, a0
+; CHECK-NEXT:    ret
+  %not = xor i32 %x, -1
+  ret i32 %not
+}
+
+define i32 @orlow12(i32 %x) {
+; CHECK-LABEL: orlow12:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    lui a1, 1048575
+; CHECK-NEXT:    orn a0, a0, a1
+; CHECK-NEXT:    ret
+  %or = or i32 %x, 4095
+  ret i32 %or
+}
+
+define i32 @xorlow12(i32 %x) {
+; CHECK-LABEL: xorlow12:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    lui a1, 1048575
+; CHECK-NEXT:    xnor a0, a0, a1
+; CHECK-NEXT:    ret
+  %xor = xor i32 %x, 4095
+  ret i32 %xor
+}

@pfusik
Copy link
Contributor Author

pfusik commented Dec 18, 2024

Here's another case that might be useful to optimize https://godbolt.org/z/MEzP15sas it already generates a not but we don't fold it.

I can work on it.

Copy link

github-actions bot commented Dec 18, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

return false;

int64_t Imm = cast<ConstantSDNode>(N)->getSExtValue();
if (!(isInt<32>(Imm) && (Imm & 0xfff) == 0xfff && Imm != -1))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Push the ! into the individual conditions and use ||.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

@pfusik pfusik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added Zbs testing. Added andimm64 test. Squashed the change so that the effect on tests is clear.

; RV64-NEXT: lui a1, 65281
; RV64-NEXT: slli a1, a1, 4
; RV64-NEXT: addi a1, a1, -1
; RV64-NEXT: and a0, a0, a1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it's possible to optimize a non-int32. Getting the condition right in general seems tricky. Can I call RISCVMatInt for Imm and ~Imm to see which one is shorter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wouldn't be the only place to compare two getIntMatCosts, so it should be fine, as long as you check the simpler exclusionary conditions first - maybe even including your isBitwiseLogicOp loop.

@pfusik
Copy link
Contributor Author

pfusik commented Dec 19, 2024

Here's another case that might be useful to optimize https://godbolt.org/z/MEzP15sas it already generates a not but we don't fold it.

Done. Covered by the andimm64srli test.

Copy link
Member

@lenary lenary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this.

By default, LLVM will squash-merge, so can you pre-commit the new test file before landing this?

@pfusik
Copy link
Contributor Author

pfusik commented Dec 19, 2024

By default, LLVM will squash-merge, so can you pre-commit the new test file before landing this?

Do you mean create a new PR for the pre-commit test?

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if ((Imm & 0xfff) != 0xfff || Imm == -1)
return false;

for (const SDNode *U : N->uses()) {
Copy link
Collaborator

@topperc topperc Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to change N->users() now. See e6b2495

@topperc
Copy link
Collaborator

topperc commented Dec 19, 2024

By default, LLVM will squash-merge, so can you pre-commit the new test file before landing this?

Do you mean create a new PR for the pre-commit test?

If you have commit access you can directly push it without going through github.

    (and X, (C<<12|0xfff)) -> (ANDN X, ~C<<12)
    (or  X, (C<<12|0xfff)) -> (ORN  X, ~C<<12)
    (xor X, (C<<12|0xfff)) -> (XNOR X, ~C<<12)

Emits better code, typically by avoiding an `ADDI HI, -1` instruction.

Co-authored-by: Craig Topper <[email protected]>
@pfusik pfusik merged commit 6e7312b into llvm:main Dec 19, 2024
8 checks passed
@lenary
Copy link
Member

lenary commented Dec 23, 2024

Nice work!

@pfusik
Copy link
Contributor Author

pfusik commented Dec 24, 2024

Thank you.

pfusik added a commit to pfusik/llvm-project that referenced this pull request Jan 13, 2025
This extends PR llvm#120221 to 64-bit constants that don't match
the 12-low-bits-set pattern.
pfusik added a commit to pfusik/llvm-project that referenced this pull request Jan 13, 2025
This extends PR llvm#120221 to 64-bit constants that don't match
the 12-low-bits-set pattern.
pfusik added a commit to pfusik/llvm-project that referenced this pull request Jan 14, 2025
This extends PR llvm#120221 to 64-bit constants that don't match
the 12-low-bits-set pattern.
pfusik added a commit that referenced this pull request Jan 14, 2025
This extends PR #120221 to 64-bit constants that don't match
the 12-low-bits-set pattern.
pfusik added a commit to pfusik/llvm-project that referenced this pull request Jan 17, 2025
pfusik added a commit to pfusik/llvm-project that referenced this pull request Jan 20, 2025
pfusik added a commit to pfusik/llvm-project that referenced this pull request Jan 22, 2025
pfusik added a commit that referenced this pull request Jan 22, 2025
pfusik added a commit to pfusik/llvm-project that referenced this pull request Apr 25, 2025
RV64 only. For 32-bit constants, a negated constant is never cheaper.

This change is similar to how llvm#120221 selects inverted bitwise instructions.
pfusik added a commit to pfusik/llvm-project that referenced this pull request Apr 28, 2025
RV64 only. For 32-bit constants, a negated constant is never cheaper.

This change is similar to how llvm#120221 selects inverted bitwise instructions.
pfusik added a commit that referenced this pull request Apr 28, 2025
…#137309)

RV64 only. For 32-bit constants, a negated constant is never cheaper.

This change is similar to how #120221 selects inverted bitwise instructions.
jyli0116 pushed a commit to jyli0116/llvm-project that referenced this pull request Apr 28, 2025
…llvm#137309)

RV64 only. For 32-bit constants, a negated constant is never cheaper.

This change is similar to how llvm#120221 selects inverted bitwise instructions.
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
…llvm#137309)

RV64 only. For 32-bit constants, a negated constant is never cheaper.

This change is similar to how llvm#120221 selects inverted bitwise instructions.
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
…llvm#137309)

RV64 only. For 32-bit constants, a negated constant is never cheaper.

This change is similar to how llvm#120221 selects inverted bitwise instructions.
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
…llvm#137309)

RV64 only. For 32-bit constants, a negated constant is never cheaper.

This change is similar to how llvm#120221 selects inverted bitwise instructions.
Ankur-0429 pushed a commit to Ankur-0429/llvm-project that referenced this pull request May 9, 2025
…llvm#137309)

RV64 only. For 32-bit constants, a negated constant is never cheaper.

This change is similar to how llvm#120221 selects inverted bitwise instructions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants