Skip to content

Commit b47de4d

Browse files
authored
Merge pull request #10 from Rot127/tblgen_capstone_backends_aarch64
2 parents 5438df2 + 4cbdcb3 commit b47de4d

File tree

8 files changed

+526
-289
lines changed

8 files changed

+526
-289
lines changed

DeprecatedFeatures.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Deprecated Features
2+
3+
Capstone needs to support features which were removed by LLVM in the past.
4+
Here we explain how to reintroduce them.
5+
6+
## Reintroduction
7+
8+
To get the old features back we copy them from the old `.td` files and include them in the new ones.
9+
10+
To include removed features from previous LLVM versions do the following:
11+
12+
1. Checkout the last LLVM version the feature was present.
13+
2. Copy all feature related definitions into a `<ARCH>Deprecated.td` file.
14+
3. Checkout the newest LLVM version again.
15+
4. Wrap the different definition types in include guards. For example the `InstrInfo` definitions could be included in:
16+
17+
```
18+
#ifndef INCLUDED_CAPSTONE_DEPR_INSTR
19+
#ifdef CAPSTONE_DEPR_INSTR
20+
#define INCLUDED_CAPSTONE_DEPR_INSTR // Ensures it is only included once
21+
22+
[Instruction definitions of removed feature]
23+
24+
#endif // INCLUDED_CAPSTONE_DEPR_INSTR
25+
#endif // CAPSTONE_DEPR_INSTR
26+
```
27+
28+
_Note that the order of `#ifndef` and `#ifdef` matters (otherwise you'll get an error from `tblgen`)._
29+
30+
5. Include the definitions in the current definition files with:
31+
32+
```
33+
#define CAPSTONE_DEPR_INSTR
34+
include "<ARCH>Deprecated.md"
35+
```
36+
37+
## Notes
38+
- It is possible that you have to change some definitions slightly.
39+
Because certain classes no longer exist or were replaced (e.g.: `GCCBuiltin` -> `ClangBuiltin`).
40+
- Some new processors might need to have the feature flag (`Has<DeprecatedFeature>`) added
41+
to their `UnsupportedFeatures` list.

README.md

Lines changed: 69 additions & 98 deletions
Original file line numberDiff line numberDiff line change
@@ -1,122 +1,93 @@
1-
# The LLVM Compiler Infrastructure
1+
# Capstone's LLVM with refactored TableGen backends
22

3-
This directory and its sub-directories contain the source code for LLVM,
4-
a toolkit for the construction of highly optimized compilers,
5-
optimizers, and run-time environments.
3+
This LLVM version has the purpose to generate code for the
4+
[Capstone disassembler](https://github.com/capstone-engine/capstone).
65

7-
The README briefly describes how to get started with building LLVM.
8-
For more information on how to contribute to the LLVM project, please
9-
take a look at the
10-
[Contributing to LLVM](https://llvm.org/docs/Contributing.html) guide.
6+
It refactors the TableGen emitter backends, so they can emit C code
7+
in addition to the C++ code they normally emit.
118

12-
## Getting Started with the LLVM System
9+
Please note that within LLVM we speak of a `Target` if we refer to an architecture.
1310

14-
Taken from [here](https://llvm.org/docs/GettingStarted.html).
11+
## Code generation
1512

16-
### Overview
13+
### Relevant files
1714

18-
Welcome to the LLVM project!
15+
The TableGen emitter backends are located in `llvm/utils/TableGen/`.
1916

20-
The LLVM project has multiple components. The core of the project is
21-
itself called "LLVM". This contains all of the tools, libraries, and header
22-
files needed to process intermediate representations and convert them into
23-
object files. Tools include an assembler, disassembler, bitcode analyzer, and
24-
bitcode optimizer. It also contains basic regression tests.
17+
The target definition files (`.td`), which define the
18+
instructions, operands, features etc., can be
19+
found in `llvm/lib/Target/<ARCH>/`.
2520

26-
C-like languages use the [Clang](http://clang.llvm.org/) frontend. This
27-
component compiles C, C++, Objective-C, and Objective-C++ code into LLVM bitcode
28-
-- and from there into object files, using LLVM.
21+
### Code generation overview
2922

30-
Other components include:
31-
the [libc++ C++ standard library](https://libcxx.llvm.org),
32-
the [LLD linker](https://lld.llvm.org), and more.
23+
Generating code for a target has 6 steps:
3324

34-
### Getting the Source Code and Building LLVM
25+
```
26+
5 6
27+
┌──────────┐ ┌──────────┐
28+
│Printer │ │CS .inc │
29+
1 2 3 4 ┌──►│Capstone ├─────►│files │
30+
┌───────┐ ┌───────────┐ ┌───────────┐ ┌──────────┐ │ └──────────┘ └──────────┘
31+
│ .td │ │ │ │ │ │ Code- │ │
32+
│ files ├────►│ TableGen ├────►│ CodeGen ├────►│ Emitter │◄─┤
33+
└───────┘ └──────┬────┘ └───────────┘ └──────────┘ │
34+
│ ▲ │ ┌──────────┐ ┌──────────┐
35+
└─────────────────────────────────┘ └──►│Printer ├─────►│LLVM .inc │
36+
│LLVM │ │files │
37+
└──────────┘ └──────────┘
38+
```
3539

36-
The LLVM Getting Started documentation may be out of date. The [Clang
37-
Getting Started](http://clang.llvm.org/get_started.html) page might have more
38-
accurate information.
40+
1. LLVM targets are defined in `.td` files. They describe instructions, operands,
41+
features and other properties.
3942

40-
This is an example work-flow and configuration to get and build the LLVM source:
43+
2. [LLVM TableGen](https://llvm.org/docs/TableGen/index.html) parses these files
44+
and converts them to an internal representation of [Classes, Records, DAGs](https://llvm.org/docs/TableGen/ProgRef.html)
45+
and other types.
4146

42-
1. Checkout LLVM (including related sub-projects like Clang):
47+
3. In the second step a TableGen component called [CodeGen](https://llvm.org/docs/CodeGenerator.html)
48+
abstracts this even further.
49+
The result is a representation which is _not_ specific to any target
50+
(e.g. the `CodeGenInstruction` class can represent a machine instruction of any target).
4351

44-
* ``git clone https://github.com/llvm/llvm-project.git``
52+
4. Different code emitter backends use the result of the former two components to
53+
generated code.
4554

46-
* Or, on windows, ``git clone --config core.autocrlf=false
47-
https://github.com/llvm/llvm-project.git``
55+
5. Whenever the emitter emits code it calls a `Printer`. Either the `PrinterCapstone` to emit C or `PrinterLLVM` to emit C++.
56+
Which one is controlled by the `--printerLang=[CCS,C++]` option passed to `llvm-tblgen`.
4857

49-
2. Configure and build LLVM and Clang:
58+
6. After the emitter backend is done, the `Printer` writes the `output_stream` content into the `.inc` files.
5059

51-
* ``cd llvm-project``
60+
### Emitter backends and their use cases
5261

53-
* ``cmake -S llvm -B build -G <generator> [options]``
62+
We use the following emitter backends
5463

55-
Some common build system generators are:
64+
| Name | Generated Code | Note |
65+
|------|----------------|------|
66+
| AsmMatcherEmitter | Mapping tables for Capstone | |
67+
| AsmWriterEmitter | State machine to decode the asm-string for a `MCInst` | |
68+
| DecoderEmitter | State machine which decodes bytes to a `MCInst`. | |
69+
| InstrInfoEmitter | Tables with instruction information (instruction enum, instr. operand information...) | |
70+
| RegisterInfoEmitter | Tables with register information (register enum, register type info...) | |
71+
| SubtargetEmitter | Table about the target features. | |
72+
| SearchableTablesEmitter | Usually used to generate tables and decoding functions for system registers. | **1.** Not all targets use this. |
73+
| | | **2.** Backend can't access the target name. Wherever the target name is needed `__ARCH__` or `##ARCH##` is printed and later replaced. |
5674

57-
* ``Ninja`` --- for generating [Ninja](https://ninja-build.org)
58-
build files. Most llvm developers use Ninja.
59-
* ``Unix Makefiles`` --- for generating make-compatible parallel makefiles.
60-
* ``Visual Studio`` --- for generating Visual Studio projects and
61-
solutions.
62-
* ``Xcode`` --- for generating Xcode projects.
75+
## Developer notes
6376

64-
Some common options:
77+
- If you find C++ code within the generated files you need to extend `PrinterCapstone::translateToC()`.
78+
If this still doesn't fix the problem, the code snipped wasn't passed through `translateToC()` before emitting.
79+
So you need to figure out where this specific code snipped is printed and add `translateToC()`.
6580

66-
* ``-DLLVM_ENABLE_PROJECTS='...'`` and ``-DLLVM_ENABLE_RUNTIMES='...'`` ---
67-
semicolon-separated list of the LLVM sub-projects and runtimes you'd like to
68-
additionally build. ``LLVM_ENABLE_PROJECTS`` can include any of: clang,
69-
clang-tools-extra, cross-project-tests, flang, libc, libclc, lld, lldb,
70-
mlir, openmp, polly, or pstl. ``LLVM_ENABLE_RUNTIMES`` can include any of
71-
libcxx, libcxxabi, libunwind, compiler-rt, libc or openmp. Some runtime
72-
projects can be specified either in ``LLVM_ENABLE_PROJECTS`` or in
73-
``LLVM_ENABLE_RUNTIMES``.
81+
- If the mapping files miss operand types or access information, then the `.td` files are incomplete (happens surprisingly often).
82+
You need to search for the instruction or operands with missing or incorrect values and fix them.
83+
```
84+
Wrong access attributes for:
85+
- Registers, Immediates: The instructions defines "out" and "in" operands incorrectly.
86+
- Memory: The "mayLoad" or "mayStore" variable is not set for the instruction.
7487
75-
For example, to build LLVM, Clang, libcxx, and libcxxabi, use
76-
``-DLLVM_ENABLE_PROJECTS="clang" -DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi"``.
88+
Operand type is invalid:
89+
- The "OperandType" variable is unset for this operand type.
90+
```
7791

78-
* ``-DCMAKE_INSTALL_PREFIX=directory`` --- Specify for *directory* the full
79-
path name of where you want the LLVM tools and libraries to be installed
80-
(default ``/usr/local``). Be careful if you install runtime libraries: if
81-
your system uses those provided by LLVM (like libc++ or libc++abi), you
82-
must not overwrite your system's copy of those libraries, since that
83-
could render your system unusable. In general, using something like
84-
``/usr`` is not advised, but ``/usr/local`` is fine.
85-
86-
* ``-DCMAKE_BUILD_TYPE=type`` --- Valid options for *type* are Debug,
87-
Release, RelWithDebInfo, and MinSizeRel. Default is Debug.
88-
89-
* ``-DLLVM_ENABLE_ASSERTIONS=On`` --- Compile with assertion checks enabled
90-
(default is Yes for Debug builds, No for all other build types).
91-
92-
* ``cmake --build build [-- [options] <target>]`` or your build system specified above
93-
directly.
94-
95-
* The default target (i.e. ``ninja`` or ``make``) will build all of LLVM.
96-
97-
* The ``check-all`` target (i.e. ``ninja check-all``) will run the
98-
regression tests to ensure everything is in working order.
99-
100-
* CMake will generate targets for each tool and library, and most
101-
LLVM sub-projects generate their own ``check-<project>`` target.
102-
103-
* Running a serial build will be **slow**. To improve speed, try running a
104-
parallel build. That's done by default in Ninja; for ``make``, use the option
105-
``-j NNN``, where ``NNN`` is the number of parallel jobs to run.
106-
In most cases, you get the best performance if you specify the number of CPU threads you have.
107-
On some Unix systems, you can specify this with ``-j$(nproc)``.
108-
109-
* For more information see [CMake](https://llvm.org/docs/CMake.html).
110-
111-
Consult the
112-
[Getting Started with LLVM](https://llvm.org/docs/GettingStarted.html#getting-started-with-llvm)
113-
page for detailed information on configuring and compiling LLVM. You can visit
114-
[Directory Layout](https://llvm.org/docs/GettingStarted.html#directory-layout)
115-
to learn about the layout of the source code tree.
116-
117-
## Getting in touch
118-
119-
Join [LLVM Discourse forums](https://discourse.llvm.org/), [discord chat](https://discord.gg/xS7Z362) or #llvm IRC channel on [OFTC](https://oftc.net/).
120-
121-
The LLVM project has adopted a [code of conduct](https://llvm.org/docs/CodeOfConduct.html) for
122-
participants to all modes of communication within the project.
92+
- If certain target features (e.g. architecture extensions) were removed from LLVM or you want to add your own,
93+
checkout [DeprecatedFeatures.md](DeprecatedFeatures.md).

llvm/utils/TableGen/AsmWriterEmitter.cpp

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -984,14 +984,15 @@ void AsmWriterEmitter::run() {
984984
namespace llvm {
985985

986986
void EmitAsmWriter(RecordKeeper &RK, raw_ostream &OS) {
987+
CodeGenTarget CGTarget(RK);
987988
PrinterLanguage const PL = PrinterLLVM::getLanguage();
988989
PrinterLLVM *PI;
989990

990991
formatted_raw_ostream FOS(OS);
991992
if (PL == PRINTER_LANG_CPP) {
992-
PI = new PrinterLLVM(FOS);
993+
PI = new PrinterLLVM(FOS, CGTarget.getName().str());
993994
} else if (PL == PRINTER_LANG_CAPSTONE_C) {
994-
PI = new PrinterCapstone(FOS);
995+
PI = new PrinterCapstone(FOS, CGTarget.getName().str());
995996
} else {
996997
llvm_unreachable("AsmWriterEmitter does not support the given output language.");
997998
}

llvm/utils/TableGen/AsmWriterInst.cpp

Lines changed: 1 addition & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -23,18 +23,6 @@ using namespace llvm;
2323

2424
static bool isIdentChar(char C) { return isAlnum(C) || C == '_'; }
2525

26-
static std::string resolveTemplateCall(std::string const &Dec) {
27-
unsigned const B = Dec.find_first_of("<");
28-
unsigned const E = Dec.find(">");
29-
std::string const &DecName = Dec.substr(0, B);
30-
std::string Args = Dec.substr(B + 1, E - B - 1);
31-
Args = std::regex_replace(Args, std::regex("true"), "1");
32-
Args = std::regex_replace(Args, std::regex("false"), "0");
33-
std::string Decoder =
34-
DecName + "_" + std::regex_replace(Args, std::regex("\\s*,\\s*"), "_");
35-
return Decoder;
36-
}
37-
3826
std::string AsmWriterOperand::getCode(bool PassSubtarget) const {
3927
if (OperandType == isLiteralTextOperand) {
4028
std::string Res;
@@ -67,11 +55,7 @@ std::string AsmWriterOperand::getCode(bool PassSubtarget) const {
6755
} else
6856
Result = Str;
6957

70-
if (Str.find("<") != std::string::npos &&
71-
LangCS)
72-
Result = resolveTemplateCall(Result) + "(MI";
73-
else
74-
Result = Result + "(MI";
58+
Result = Result + "(MI";
7559

7660
if (PCRel)
7761
Result += ", Address";

llvm/utils/TableGen/Printer.h

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ typedef enum {
3030
ST_NONE,
3131
ST_DECL_OS,
3232
ST_IMPL_OS,
33-
ST_ENUM_OS,
33+
ST_ENUM_SYSOPS_OS,
3434
} StreamType;
3535

3636
namespace llvm {
@@ -949,20 +949,19 @@ class PrinterLLVM {
949949
virtual void searchableTablesEmitIsContiguousCase(StringRef const &IndexName,
950950
const GenericTable &Table,
951951
const SearchIndex &Index,
952-
bool IsPrimary) const;
952+
bool IsPrimary);
953953
virtual void searchableTablesEmitIndexArrayV() const;
954954
virtual void searchableTablesEmitIndexArrayIV(
955955
std::pair<Record *, unsigned> const &Entry) const;
956956
virtual void searchableTablesEmitIndexArrayIII(ListSeparator &LS,
957957
std::string Repr) const;
958958
virtual void searchableTablesEmitIndexArrayII() const;
959959
virtual void searchableTablesEmitIndexArrayI() const;
960-
virtual void
961-
searchableTablesEmitIndexTypeStruct(const GenericTable &Table,
962-
const SearchIndex &Index) const;
960+
virtual void searchableTablesEmitIndexTypeStruct(const GenericTable &Table,
961+
const SearchIndex &Index);
963962
virtual void searchableTablesEmitReturns(const GenericTable &Table,
964963
const SearchIndex &Index,
965-
bool IsPrimary) const;
964+
bool IsPrimary);
966965
virtual void
967966
searchableTablesEmitIndexLamda(const SearchIndex &Index,
968967
StringRef const &IndexName,
@@ -991,7 +990,9 @@ class PrinterLLVM {
991990
///
992991
/// Output language: C
993992
class PrinterCapstone : public PrinterLLVM {
994-
bool DoNotEmit = false;
993+
// TODO: Toggle a flag is not nice to skip the search functions by strings
994+
// is ugly. We should support them in the future.
995+
bool EmittingNameLookup = false;
995996

996997
public:
997998
using PrinterLLVM::PrinterLLVM;
@@ -1008,6 +1009,9 @@ class PrinterCapstone : public PrinterLLVM {
10081009
bool Newline = true,
10091010
bool UndefAtEnd = false) const override;
10101011

1012+
static std::string translateToC(std::string const &TargetName,
1013+
std::string const &Dec);
1014+
10111015
//------------------------
10121016
// Backend: RegisterInfo
10131017
//------------------------
@@ -1795,20 +1799,19 @@ class PrinterCapstone : public PrinterLLVM {
17951799
void searchableTablesEmitIsContiguousCase(StringRef const &IndexName,
17961800
const GenericTable &Table,
17971801
const SearchIndex &Index,
1798-
bool IsPrimary) const override;
1802+
bool IsPrimary) override;
17991803
void searchableTablesEmitIndexArrayV() const override;
18001804
void searchableTablesEmitIndexArrayIV(
18011805
std::pair<Record *, unsigned> const &Entry) const override;
18021806
void searchableTablesEmitIndexArrayIII(ListSeparator &LS,
18031807
std::string Repr) const override;
18041808
void searchableTablesEmitIndexArrayII() const override;
18051809
void searchableTablesEmitIndexArrayI() const override;
1806-
void
1807-
searchableTablesEmitIndexTypeStruct(const GenericTable &Table,
1808-
const SearchIndex &Index) const override;
1810+
void searchableTablesEmitIndexTypeStruct(const GenericTable &Table,
1811+
const SearchIndex &Index) override;
18091812
void searchableTablesEmitReturns(const GenericTable &Table,
18101813
const SearchIndex &Index,
1811-
bool IsPrimary) const override;
1814+
bool IsPrimary) override;
18121815
void
18131816
searchableTablesEmitIndexLamda(const SearchIndex &Index,
18141817
StringRef const &IndexName,

0 commit comments

Comments
 (0)