Skip to content

[RISCV][MC] Implement evaluateBranch for auipc+jalr pairs #65480

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 20, 2023

Conversation

mtvec
Copy link
Contributor

@mtvec mtvec commented Sep 6, 2023

This patch implements MCInstrAnalysis state in order to be able analyze auipc+jalr pairs inside evaluateBranch.

This is implemented as follows:

  • State: array of currently known GPR values;
  • Whenever an auipc is detected in updateState, update the state value of RD with the immediate;
  • Whenever a jalr is detected in evaluateBranch, check if the state holds a value for RS1 and use that to compute its target.

Note that this is similar to how binutils implements it and the output of llvm-objdump should now mostly match the one of GNU objdump.

This patch also updates the relevant llvm-objdump patches and adds a new one testing the output for interleaved auipc+jalr pairs.

This PR builds on #65479. Please only review the top commit here.

@mtvec mtvec requested a review from a team as a code owner September 6, 2023 13:45
@github-actions github-actions bot added backend:RISC-V mc Machine (object) code labels Sep 6, 2023
@mtvec mtvec force-pushed the riscv-eval-auipc-jalr branch from 4065620 to 2b9d089 Compare September 6, 2023 15:12
@mtvec
Copy link
Contributor Author

mtvec commented Sep 20, 2023

Ping.

@llvmbot
Copy link
Member

llvmbot commented Oct 2, 2023

@llvm/pr-subscribers-llvm-binary-utilities

Changes

This patch implements MCInstrAnalysis state in order to be able analyze auipc+jalr pairs inside evaluateBranch.

This is implemented as follows:

  • State: array of currently known GPR values;
  • Whenever an auipc is detected in updateState, update the state value of RD with the immediate;
  • Whenever a jalr is detected in evaluateBranch, check if the state holds a value for RS1 and use that to compute its target.

Note that this is similar to how binutils implements it and the output of llvm-objdump should now mostly match the one of GNU objdump.

This patch also updates the relevant llvm-objdump patches and adds a new one testing the output for interleaved auipc+jalr pairs.

This PR builds on #65479. Please only review the top commit here.


Full diff: https://github.com/llvm/llvm-project/pull/65480.diff

5 Files Affected:

  • (modified) llvm/include/llvm/MC/MCInstrAnalysis.h (+15)
  • (modified) llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp (+64)
  • (modified) llvm/test/tools/llvm-objdump/ELF/RISCV/branches.s (+2-2)
  • (added) llvm/test/tools/llvm-objdump/ELF/RISCV/multi-instr-target.s (+45)
  • (modified) llvm/tools/llvm-objdump/llvm-objdump.cpp (+20-5)
diff --git a/llvm/include/llvm/MC/MCInstrAnalysis.h b/llvm/include/llvm/MC/MCInstrAnalysis.h
index c3c675c39c5590c..dac12af599e6f34 100644
--- a/llvm/include/llvm/MC/MCInstrAnalysis.h
+++ b/llvm/include/llvm/MC/MCInstrAnalysis.h
@@ -37,6 +37,21 @@ class MCInstrAnalysis {
   MCInstrAnalysis(const MCInstrInfo *Info) : Info(Info) {}
   virtual ~MCInstrAnalysis() = default;
 
+  /// Clear the internal state. See updateState for more information.
+  virtual void resetState() {}
+
+  /// Update internal state with \p Inst at \p Addr.
+  ///
+  /// For some types a analyses, inspecting a single instruction is not
+  /// sufficient. Some examples are auipc/jalr pairs on RISC-V or adrp/ldr pairs
+  /// on AArch64. To support inspecting multiple instructions, targets may keep
+  /// track of an internal state while analysing instructions. Clients should
+  /// call updateState for every instruction which allows later calls to one of
+  /// the analysis functions to take previous instructions into account.
+  /// Whenever state becomes irrelevant (e.g., when starting to disassemble a
+  /// new function), clients should call resetState to clear it.
+  virtual void updateState(const MCInst &Inst, uint64_t Addr) {}
+
   virtual bool isBranch(const MCInst &Inst) const {
     return Info->get(Inst.getOpcode()).isBranch();
   }
diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp
index 75af5c2de09469b..5349fa0a4b30881 100644
--- a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp
+++ b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp
@@ -114,10 +114,65 @@ static MCTargetStreamer *createRISCVNullTargetStreamer(MCStreamer &S) {
 namespace {
 
 class RISCVMCInstrAnalysis : public MCInstrAnalysis {
+  std::optional<int64_t> GPRState[31];
+
+  static bool isGPR(unsigned Reg) {
+    return Reg >= RISCV::X0 && Reg <= RISCV::X31;
+  }
+
+  void setGPRState(unsigned Reg, std::optional<int64_t> Value) {
+    assert(isGPR(Reg) && "Invalid GPR reg");
+
+    if (Reg != RISCV::X0)
+      GPRState[Reg - RISCV::X1] = Value;
+  }
+
+  std::optional<int64_t> getGPRState(unsigned Reg) const {
+    assert(isGPR(Reg) && "Invalid GPR reg");
+
+    if (Reg == RISCV::X0)
+      return 0;
+    return GPRState[Reg - RISCV::X1];
+  }
+
 public:
   explicit RISCVMCInstrAnalysis(const MCInstrInfo *Info)
       : MCInstrAnalysis(Info) {}
 
+  void resetState() override {
+    std::fill(std::begin(GPRState), std::end(GPRState), std::nullopt);
+  }
+
+  void updateState(const MCInst &Inst, uint64_t Addr) override {
+    // Terminators mark the end of a basic block which means the sequentially
+    // next instruction will be the first of another basic block and the current
+    // state will typically not be valid anymore. For calls, we assume all
+    // registers may be clobbered by the callee (TODO: should we take the
+    // calling convention into account?).
+    if (isTerminator(Inst) || isCall(Inst)) {
+      resetState();
+      return;
+    }
+
+    switch (Inst.getOpcode()) {
+    default: {
+      // Clear the state of all defined registers for instructions that we don't
+      // explicitly support.
+      auto NumDefs = Info->get(Inst.getOpcode()).getNumDefs();
+      for (unsigned I = 0; I < NumDefs; ++I) {
+        auto DefReg = Inst.getOperand(I).getReg();
+        if (isGPR(DefReg))
+          setGPRState(DefReg, std::nullopt);
+      }
+      break;
+    }
+    case RISCV::AUIPC:
+      setGPRState(Inst.getOperand(0).getReg(),
+                  Addr + (Inst.getOperand(1).getImm() << 12));
+      break;
+    }
+  }
+
   bool evaluateBranch(const MCInst &Inst, uint64_t Addr, uint64_t Size,
                       uint64_t &Target) const override {
     if (isConditionalBranch(Inst)) {
@@ -140,6 +195,15 @@ class RISCVMCInstrAnalysis : public MCInstrAnalysis {
       return true;
     }
 
+    if (Inst.getOpcode() == RISCV::JALR) {
+      if (auto TargetRegState = getGPRState(Inst.getOperand(1).getReg())) {
+        Target = *TargetRegState + Inst.getOperand(2).getImm();
+        return true;
+      }
+
+      return false;
+    }
+
     return false;
   }
 
diff --git a/llvm/test/tools/llvm-objdump/ELF/RISCV/branches.s b/llvm/test/tools/llvm-objdump/ELF/RISCV/branches.s
index 5fec4e6e25a39a3..ebd86a702b70e7c 100644
--- a/llvm/test/tools/llvm-objdump/ELF/RISCV/branches.s
+++ b/llvm/test/tools/llvm-objdump/ELF/RISCV/branches.s
@@ -57,11 +57,11 @@ c.jal bar
 c.j bar
 
 # CHECK: auipc ra, 0
-# CHECK: jalr	ra, 16(ra){{$}}
+# CHECK: jalr	ra, 16(ra) <foo+0x58>
 call .Llocal
 
 # CHECK: auipc ra, 0
-# CHECK: jalr	ra, 16(ra){{$}}
+# CHECK: jalr	ra, 16(ra) <bar>
 call bar
 
 .Llocal:
diff --git a/llvm/test/tools/llvm-objdump/ELF/RISCV/multi-instr-target.s b/llvm/test/tools/llvm-objdump/ELF/RISCV/multi-instr-target.s
new file mode 100644
index 000000000000000..91b643e961fc6df
--- /dev/null
+++ b/llvm/test/tools/llvm-objdump/ELF/RISCV/multi-instr-target.s
@@ -0,0 +1,45 @@
+# RUN: llvm-mc -filetype=obj -triple riscv32 -mattr=+c < %s | \
+# RUN:     llvm-objdump -d -M no-aliases --no-show-raw-insn - | \
+# RUN:     FileCheck %s
+
+## Test multiple interleaved auipc/jalr pairs
+# CHECK: auipc t0, 0
+1: auipc t0, %pcrel_hi(bar)
+# CHECK: auipc t1, 0
+2: auipc t1, %pcrel_hi(bar)
+# CHECK: jalr ra, {{[0-9]+}}(t0) <bar>
+jalr %pcrel_lo(1b)(t0)
+## Target should not be printed because the call above clobbers register state
+# CHECK: jalr ra, {{[0-9]+}}(t1){{$}}
+jalr %pcrel_lo(2b)(t1)
+
+## Test that auipc+jalr with a write to the target register in between does not
+## print the target
+# CHECK: auipc t0, 0
+1: auipc t0, %pcrel_hi(bar)
+# CHECK: c.li t0, 0
+li t0, 0
+# CHECK: jalr ra, {{[0-9]+}}(t0){{$}}
+jalr %pcrel_lo(1b)(t0)
+
+## Test that auipc+jalr with a write to an unrelated register in between does
+## print the target
+# CHECK: auipc t0, 0
+1: auipc t0, %pcrel_hi(bar)
+# CHECK: c.li t1, 0
+li t1, 0
+# CHECK: jalr ra, {{[0-9]+}}(t0) <bar>
+jalr %pcrel_lo(1b)(t0)
+
+## Test that auipc+jalr with a terminator in between does not print the target
+# CHECK: auipc t0, 0
+1: auipc t0, %pcrel_hi(bar)
+# CHECK: c.j {{.*}} <bar>
+j bar
+# CHECK: jalr ra, {{[0-9]+}}(t0){{$}}
+jalr %pcrel_lo(1b)(t0)
+
+# CHECK-LABEL: <bar>:
+bar:
+# CHECK: c.nop
+nop
diff --git a/llvm/tools/llvm-objdump/llvm-objdump.cpp b/llvm/tools/llvm-objdump/llvm-objdump.cpp
index 96d74d6e2d5e865..385be9fb9257e16 100644
--- a/llvm/tools/llvm-objdump/llvm-objdump.cpp
+++ b/llvm/tools/llvm-objdump/llvm-objdump.cpp
@@ -842,7 +842,7 @@ class DisassemblerTarget {
   std::unique_ptr<const MCSubtargetInfo> SubtargetInfo;
   std::shared_ptr<MCContext> Context;
   std::unique_ptr<MCDisassembler> DisAsm;
-  std::shared_ptr<const MCInstrAnalysis> InstrAnalysis;
+  std::shared_ptr<MCInstrAnalysis> InstrAnalysis;
   std::shared_ptr<MCInstPrinter> InstPrinter;
   PrettyPrinter *Printer;
 
@@ -1265,14 +1265,19 @@ collectBBAddrMapLabels(const std::unordered_map<uint64_t, BBAddrMap> &AddrToBBAd
   }
 }
 
-static void collectLocalBranchTargets(
-    ArrayRef<uint8_t> Bytes, const MCInstrAnalysis *MIA, MCDisassembler *DisAsm,
-    MCInstPrinter *IP, const MCSubtargetInfo *STI, uint64_t SectionAddr,
-    uint64_t Start, uint64_t End, std::unordered_map<uint64_t, std::string> &Labels) {
+static void
+collectLocalBranchTargets(ArrayRef<uint8_t> Bytes, MCInstrAnalysis *MIA,
+                          MCDisassembler *DisAsm, MCInstPrinter *IP,
+                          const MCSubtargetInfo *STI, uint64_t SectionAddr,
+                          uint64_t Start, uint64_t End,
+                          std::unordered_map<uint64_t, std::string> &Labels) {
   // So far only supports PowerPC and X86.
   if (!STI->getTargetTriple().isPPC() && !STI->getTargetTriple().isX86())
     return;
 
+  if (MIA)
+    MIA->resetState();
+
   Labels.clear();
   unsigned LabelCount = 0;
   Start += SectionAddr;
@@ -1298,6 +1303,9 @@ static void collectLocalBranchTargets(
           !Labels.count(Target) &&
           !(STI->getTargetTriple().isPPC() && Target == Index))
         Labels[Target] = ("L" + Twine(LabelCount++)).str();
+      MIA->updateState(Inst, Index);
+    } else if (!Disassembled && MIA) {
+      MIA->resetState();
     }
     Index += Size;
   }
@@ -1939,6 +1947,9 @@ disassembleObject(ObjectFile &Obj, const ObjectFile &DbgObj,
                                BBAddrMapLabels);
       }
 
+      if (DT->InstrAnalysis)
+        DT->InstrAnalysis->resetState();
+
       while (Index < End) {
         // ARM and AArch64 ELF binaries can interleave data and text in the
         // same section. We rely on the markers introduced to understand what
@@ -2155,6 +2166,10 @@ disassembleObject(ObjectFile &Obj, const ObjectFile &DbgObj,
               if (TargetOS == &CommentStream)
                 *TargetOS << "\n";
             }
+
+            DT->InstrAnalysis->updateState(Inst, SectionAddr + Index);
+          } else if (!Disassembled && DT->InstrAnalysis) {
+            DT->InstrAnalysis->resetState();
           }
         }
 

@arichardson
Copy link
Member

In general this looks good to me but I do worry a bit about the performance implications of regularly zeroing all the GPRS. Could you show the impact on disassembly of a large binary (e.g. statically linked clang for riscv?)

@mtvec
Copy link
Contributor Author

mtvec commented Oct 2, 2023

In general this looks good to me but I do worry a bit about the performance implications of regularly zeroing all the GPRS. Could you show the impact on disassembly of a large binary (e.g. statically linked clang for riscv?)

Here's a quick benchmark (release build without asserts):

$ file clang
clang: ELF 64-bit LSB pie executable, UCB RISC-V, RVC, double-float ABI, version 1 (GNU/Linux), dynamically linked, interpreter /lib/ld-linux-riscv64-lp64d.so.1, BuildID[sha1]=a4645a5d30617084df5efea9662d984d0a9dc918, for GNU/Linux 4.15.0, not stripped

$ size clang
     text	   data	    bss	      dec	    hex	filename
149308819	4260800	 622672	154192291	930c9a3	clang

$ hyperfine --parameter-list which main,pr './llvm-objdump.{which} -d clang > /dev/null' --warmup 3
Benchmark 1: ./llvm-objdump.main -d clang > /dev/null
  Time (mean ± σ):     56.003 s ±  0.206 s    [User: 32.285 s, System: 23.695 s]
  Range (min … max):   55.849 s … 56.511 s    10 runs

Benchmark 2: ./llvm-objdump.pr -d clang > /dev/null
  Time (mean ± σ):     56.651 s ±  0.071 s    [User: 32.911 s, System: 23.713 s]
  Range (min … max):   56.550 s … 56.797 s    10 runs

Summary
  ./llvm-objdump.main -d clang > /dev/null ran
    1.01 ± 0.00 times faster than ./llvm-objdump.pr -d clang > /dev/null

So there seems to be about 1% overhead.

If this is too much, one solution would be to not store an array of std::optional<uint64_t> but one containing just uint64_t and a separate 32-bit bitmap. I suppose that would remove most of the overhead of clearing state.

@arichardson
Copy link
Member

In general this looks good to me but I do worry a bit about the performance implications of regularly zeroing all the GPRS. Could you show the impact on disassembly of a large binary (e.g. statically linked clang for riscv?)

Here's a quick benchmark (release build without asserts):

$ file clang
clang: ELF 64-bit LSB pie executable, UCB RISC-V, RVC, double-float ABI, version 1 (GNU/Linux), dynamically linked, interpreter /lib/ld-linux-riscv64-lp64d.so.1, BuildID[sha1]=a4645a5d30617084df5efea9662d984d0a9dc918, for GNU/Linux 4.15.0, not stripped

$ size clang
     text	   data	    bss	      dec	    hex	filename
149308819	4260800	 622672	154192291	930c9a3	clang

$ hyperfine --parameter-list which main,pr './llvm-objdump.{which} -d clang > /dev/null' --warmup 3
Benchmark 1: ./llvm-objdump.main -d clang > /dev/null
  Time (mean ± σ):     56.003 s ±  0.206 s    [User: 32.285 s, System: 23.695 s]
  Range (min … max):   55.849 s … 56.511 s    10 runs

Benchmark 2: ./llvm-objdump.pr -d clang > /dev/null
  Time (mean ± σ):     56.651 s ±  0.071 s    [User: 32.911 s, System: 23.713 s]
  Range (min … max):   56.550 s … 56.797 s    10 runs

Summary
  ./llvm-objdump.main -d clang > /dev/null ran
    1.01 ± 0.00 times faster than ./llvm-objdump.pr -d clang > /dev/null

So there seems to be about 1% overhead.

If this is too much, one solution would be to not store an array of std::optional<uint64_t> but one containing just uint64_t and a separate 32-bit bitmap. I suppose that would remove most of the overhead of clearing state.

That seems better than expected and might be acceptable. However, I think using a bitset that can be zeroed with a single store should bring it down to near zero and will not add much complexity.

@mtvec
Copy link
Contributor Author

mtvec commented Oct 3, 2023

That seems better than expected and might be acceptable. However, I think using a bitset that can be zeroed with a single store should bring it down to near zero and will not add much complexity.

I implemented this and to my surprise, the overhead is still around 1%. After some digging, I noticed that the increase in runtime seems almost entirely due to the increased output size (total size increase is about 2% on the clang benchmark, the functions contributing the most to the increased runtime are output-related ones like llvm::sys::unicode::columnWidthUTF8).

Since this overhead is unavoidable, do you still feel it's worthwhile to implement the optimization?

@arichardson
Copy link
Member

That seems better than expected and might be acceptable. However, I think using a bitset that can be zeroed with a single store should bring it down to near zero and will not add much complexity.

I implemented this and to my surprise, the overhead is still around 1%. After some digging, I noticed that the increase in runtime seems almost entirely due to the increased output size (total size increase is about 2% on the clang benchmark, the functions contributing the most to the increased runtime are output-related ones like llvm::sys::unicode::columnWidthUTF8).

Since this overhead is unavoidable, do you still feel it's worthwhile to implement the optimization?

I guess it makes sense that printing and looking up symbols is more expensive than zeroing 512 bytes somewhat regularly. I think the optimization does make sense unless it makes the code a lot less readable. But I'll leave that decision to other reviewers. The change looks good to me now but please wait for a review from someone else before merging.

@mtvec
Copy link
Contributor Author

mtvec commented Oct 4, 2023

I think the optimization does make sense unless it makes the code a lot less readable.

I don't think it makes the code unreadable so I implemented the optimization.

This patch implements `MCInstrAnalysis` state in order to be able
analyze auipc+jalr pairs inside `evaluateBranch`.

This is implemented as follows:
- State: array of currently known GPR values;
- Whenever an auipc is detected in `updateState`, update the state value
  of RD with the immediate;
- Whenever a jalr is detected in `evaluateBranch`, check if the state
  holds a value for RS1 and use that to compute its target.

Note that this is similar to how binutils implements it and the output
of llvm-objdump should now mostly match the one of GNU objdump.

This patch also updates the relevant llvm-objdump patches and adds a new
one testing the output for interleaved auipc+jalr pairs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants