|
1 | | -# The LLVM Compiler Infrastructure |
| 1 | +# Capstone's LLVM with refactored TableGen backends |
2 | 2 |
|
3 | | -This directory and its sub-directories contain the source code for LLVM, |
4 | | -a toolkit for the construction of highly optimized compilers, |
5 | | -optimizers, and run-time environments. |
| 3 | +This LLVM version has the purpose to generate code for the |
| 4 | +[Capstone disassembler](https://github.com/capstone-engine/capstone). |
6 | 5 |
|
7 | | -The README briefly describes how to get started with building LLVM. |
8 | | -For more information on how to contribute to the LLVM project, please |
9 | | -take a look at the |
10 | | -[Contributing to LLVM](https://llvm.org/docs/Contributing.html) guide. |
| 6 | +It refactors the TableGen emitter backends, so they can emit C code |
| 7 | +in addition to the C++ code they normally emit. |
11 | 8 |
|
12 | | -## Getting Started with the LLVM System |
| 9 | +Please note that within LLVM we speak of a `Target` if we refer to an architecture. |
13 | 10 |
|
14 | | -Taken from [here](https://llvm.org/docs/GettingStarted.html). |
| 11 | +## Code generation |
15 | 12 |
|
16 | | -### Overview |
| 13 | +### Relevant files |
17 | 14 |
|
18 | | -Welcome to the LLVM project! |
| 15 | +The TableGen emitter backends are located in `llvm/utils/TableGen/`. |
19 | 16 |
|
20 | | -The LLVM project has multiple components. The core of the project is |
21 | | -itself called "LLVM". This contains all of the tools, libraries, and header |
22 | | -files needed to process intermediate representations and convert them into |
23 | | -object files. Tools include an assembler, disassembler, bitcode analyzer, and |
24 | | -bitcode optimizer. It also contains basic regression tests. |
| 17 | +The target definition files (`.td`), which define the |
| 18 | +instructions, operands, features etc., can be |
| 19 | +found in `llvm/lib/Target/<ARCH>/`. |
25 | 20 |
|
26 | | -C-like languages use the [Clang](http://clang.llvm.org/) frontend. This |
27 | | -component compiles C, C++, Objective-C, and Objective-C++ code into LLVM bitcode |
28 | | --- and from there into object files, using LLVM. |
| 21 | +### Code generation overview |
29 | 22 |
|
30 | | -Other components include: |
31 | | -the [libc++ C++ standard library](https://libcxx.llvm.org), |
32 | | -the [LLD linker](https://lld.llvm.org), and more. |
| 23 | +Generating code for a target has 6 steps: |
33 | 24 |
|
34 | | -### Getting the Source Code and Building LLVM |
| 25 | +``` |
| 26 | + 5 6 |
| 27 | + ┌──────────┐ ┌──────────┐ |
| 28 | + │Printer │ │CS .inc │ |
| 29 | + 1 2 3 4 ┌──►│Capstone ├─────►│files │ |
| 30 | +┌───────┐ ┌───────────┐ ┌───────────┐ ┌──────────┐ │ └──────────┘ └──────────┘ |
| 31 | +│ .td │ │ │ │ │ │ Code- │ │ |
| 32 | +│ files ├────►│ TableGen ├────►│ CodeGen ├────►│ Emitter │◄─┤ |
| 33 | +└───────┘ └──────┬────┘ └───────────┘ └──────────┘ │ |
| 34 | + │ ▲ │ ┌──────────┐ ┌──────────┐ |
| 35 | + └─────────────────────────────────┘ └──►│Printer ├─────►│LLVM .inc │ |
| 36 | + │LLVM │ │files │ |
| 37 | + └──────────┘ └──────────┘ |
| 38 | +``` |
35 | 39 |
|
36 | | -The LLVM Getting Started documentation may be out of date. The [Clang |
37 | | -Getting Started](http://clang.llvm.org/get_started.html) page might have more |
38 | | -accurate information. |
| 40 | +1. LLVM targets are defined in `.td` files. They describe instructions, operands, |
| 41 | +features and other properties. |
39 | 42 |
|
40 | | -This is an example work-flow and configuration to get and build the LLVM source: |
| 43 | +2. [LLVM TableGen](https://llvm.org/docs/TableGen/index.html) parses these files |
| 44 | +and converts them to an internal representation of [Classes, Records, DAGs](https://llvm.org/docs/TableGen/ProgRef.html) |
| 45 | + and other types. |
41 | 46 |
|
42 | | -1. Checkout LLVM (including related sub-projects like Clang): |
| 47 | +3. In the second step a TableGen component called [CodeGen](https://llvm.org/docs/CodeGenerator.html) |
| 48 | +abstracts this even further. |
| 49 | +The result is a representation which is _not_ specific to any target |
| 50 | +(e.g. the `CodeGenInstruction` class can represent a machine instruction of any target). |
43 | 51 |
|
44 | | - * ``git clone https://github.com/llvm/llvm-project.git`` |
| 52 | +4. Different code emitter backends use the result of the former two components to |
| 53 | +generated code. |
45 | 54 |
|
46 | | - * Or, on windows, ``git clone --config core.autocrlf=false |
47 | | - https://github.com/llvm/llvm-project.git`` |
| 55 | +5. Whenever the emitter emits code it calls a `Printer`. Either the `PrinterCapstone` to emit C or `PrinterLLVM` to emit C++. |
| 56 | +Which one is controlled by the `--printerLang=[CCS,C++]` option passed to `llvm-tblgen`. |
48 | 57 |
|
49 | | -2. Configure and build LLVM and Clang: |
| 58 | +6. After the emitter backend is done, the `Printer` writes the `output_stream` content into the `.inc` files. |
50 | 59 |
|
51 | | - * ``cd llvm-project`` |
| 60 | +### Emitter backends and their use cases |
52 | 61 |
|
53 | | - * ``cmake -S llvm -B build -G <generator> [options]`` |
| 62 | +We use the following emitter backends |
54 | 63 |
|
55 | | - Some common build system generators are: |
| 64 | +| Name | Generated Code | Note | |
| 65 | +|------|----------------|------| |
| 66 | +| AsmMatcherEmitter | Mapping tables for Capstone | | |
| 67 | +| AsmWriterEmitter | State machine to decode the asm-string for a `MCInst` | | |
| 68 | +| DecoderEmitter | State machine which decodes bytes to a `MCInst`. | | |
| 69 | +| InstrInfoEmitter | Tables with instruction information (instruction enum, instr. operand information...) | | |
| 70 | +| RegisterInfoEmitter | Tables with register information (register enum, register type info...) | | |
| 71 | +| SubtargetEmitter | Table about the target features. | | |
| 72 | +| SearchableTablesEmitter | Usually used to generate tables and decoding functions for system registers. | **1.** Not all targets use this. | |
| 73 | +| | | **2.** Backend can't access the target name. Wherever the target name is needed `__ARCH__` or `##ARCH##` is printed and later replaced. | |
56 | 74 |
|
57 | | - * ``Ninja`` --- for generating [Ninja](https://ninja-build.org) |
58 | | - build files. Most llvm developers use Ninja. |
59 | | - * ``Unix Makefiles`` --- for generating make-compatible parallel makefiles. |
60 | | - * ``Visual Studio`` --- for generating Visual Studio projects and |
61 | | - solutions. |
62 | | - * ``Xcode`` --- for generating Xcode projects. |
| 75 | +## Developer notes |
63 | 76 |
|
64 | | - Some common options: |
| 77 | +- If you find C++ code within the generated files you need to extend `PrinterCapstone::translateToC()`. |
| 78 | +If this still doesn't fix the problem, the code snipped wasn't passed through `translateToC()` before emitting. |
| 79 | +So you need to figure out where this specific code snipped is printed and add `translateToC()`. |
65 | 80 |
|
66 | | - * ``-DLLVM_ENABLE_PROJECTS='...'`` and ``-DLLVM_ENABLE_RUNTIMES='...'`` --- |
67 | | - semicolon-separated list of the LLVM sub-projects and runtimes you'd like to |
68 | | - additionally build. ``LLVM_ENABLE_PROJECTS`` can include any of: clang, |
69 | | - clang-tools-extra, cross-project-tests, flang, libc, libclc, lld, lldb, |
70 | | - mlir, openmp, polly, or pstl. ``LLVM_ENABLE_RUNTIMES`` can include any of |
71 | | - libcxx, libcxxabi, libunwind, compiler-rt, libc or openmp. Some runtime |
72 | | - projects can be specified either in ``LLVM_ENABLE_PROJECTS`` or in |
73 | | - ``LLVM_ENABLE_RUNTIMES``. |
| 81 | +- If the mapping files miss operand types or access information, then the `.td` files are incomplete (happens surprisingly often). |
| 82 | +You need to search for the instruction or operands with missing or incorrect values and fix them. |
| 83 | + ``` |
| 84 | + Wrong access attributes for: |
| 85 | + - Registers, Immediates: The instructions defines "out" and "in" operands incorrectly. |
| 86 | + - Memory: The "mayLoad" or "mayStore" variable is not set for the instruction. |
74 | 87 |
|
75 | | - For example, to build LLVM, Clang, libcxx, and libcxxabi, use |
76 | | - ``-DLLVM_ENABLE_PROJECTS="clang" -DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi"``. |
| 88 | + Operand type is invalid: |
| 89 | + - The "OperandType" variable is unset for this operand type. |
| 90 | + ``` |
77 | 91 |
|
78 | | - * ``-DCMAKE_INSTALL_PREFIX=directory`` --- Specify for *directory* the full |
79 | | - path name of where you want the LLVM tools and libraries to be installed |
80 | | - (default ``/usr/local``). Be careful if you install runtime libraries: if |
81 | | - your system uses those provided by LLVM (like libc++ or libc++abi), you |
82 | | - must not overwrite your system's copy of those libraries, since that |
83 | | - could render your system unusable. In general, using something like |
84 | | - ``/usr`` is not advised, but ``/usr/local`` is fine. |
85 | | - |
86 | | - * ``-DCMAKE_BUILD_TYPE=type`` --- Valid options for *type* are Debug, |
87 | | - Release, RelWithDebInfo, and MinSizeRel. Default is Debug. |
88 | | - |
89 | | - * ``-DLLVM_ENABLE_ASSERTIONS=On`` --- Compile with assertion checks enabled |
90 | | - (default is Yes for Debug builds, No for all other build types). |
91 | | - |
92 | | - * ``cmake --build build [-- [options] <target>]`` or your build system specified above |
93 | | - directly. |
94 | | - |
95 | | - * The default target (i.e. ``ninja`` or ``make``) will build all of LLVM. |
96 | | - |
97 | | - * The ``check-all`` target (i.e. ``ninja check-all``) will run the |
98 | | - regression tests to ensure everything is in working order. |
99 | | - |
100 | | - * CMake will generate targets for each tool and library, and most |
101 | | - LLVM sub-projects generate their own ``check-<project>`` target. |
102 | | - |
103 | | - * Running a serial build will be **slow**. To improve speed, try running a |
104 | | - parallel build. That's done by default in Ninja; for ``make``, use the option |
105 | | - ``-j NNN``, where ``NNN`` is the number of parallel jobs to run. |
106 | | - In most cases, you get the best performance if you specify the number of CPU threads you have. |
107 | | - On some Unix systems, you can specify this with ``-j$(nproc)``. |
108 | | - |
109 | | - * For more information see [CMake](https://llvm.org/docs/CMake.html). |
110 | | - |
111 | | -Consult the |
112 | | -[Getting Started with LLVM](https://llvm.org/docs/GettingStarted.html#getting-started-with-llvm) |
113 | | -page for detailed information on configuring and compiling LLVM. You can visit |
114 | | -[Directory Layout](https://llvm.org/docs/GettingStarted.html#directory-layout) |
115 | | -to learn about the layout of the source code tree. |
116 | | - |
117 | | -## Getting in touch |
118 | | - |
119 | | -Join [LLVM Discourse forums](https://discourse.llvm.org/), [discord chat](https://discord.gg/xS7Z362) or #llvm IRC channel on [OFTC](https://oftc.net/). |
120 | | - |
121 | | -The LLVM project has adopted a [code of conduct](https://llvm.org/docs/CodeOfConduct.html) for |
122 | | -participants to all modes of communication within the project. |
| 92 | +- If certain target features (e.g. architecture extensions) were removed from LLVM or you want to add your own, |
| 93 | +checkout [DeprecatedFeatures.md](DeprecatedFeatures.md). |
0 commit comments