Skip to content
juj edited this page Jan 30, 2014 · 119 revisions

LLVM Backend, aka "fastcomp"

The original emscripten compiler was written in JavaScript, which was very useful for quickly prototyping new ideas during development of the various new methods needed for effective compilation to JavaScript (the relooper, longjmp tricks, C++ exceptions in JS, etc.). It is also quite stable at this point and generates very good code. However, it has a few downsides:

  • Compiler speed. The generated code is fast, but generating the code is not so fast. Especially with full optimizations on, builds can be quite slow. This is not an issue for tens of thousands of lines of code, and is annoying but not horrible for hundreds of thousands, but it a serious problems for millions.
  • LLVM backends integrate more closely with LLVM, and can leverage LLVM's internal code analysis and optimization. The original compiler just parses LLVM bitcode externally, so it cannot benefit from internal capabilities of LLVM.
  • An upstream LLVM backend is easier to use for people than a separate project. Compiling to JS should, as much as possible, be just another backend in a compiler.

An intern, @int3, experimented with LLVM backends during Summer 2013. Recently that work has been picked up, and the current plan is to replace the core compiler with an LLVM IR backend (same type as the CppBackend), and to rely on the PNaCl legalization passes from their ABI simplification code.

Work is going on in https://github.com/kripken/emscripten-fastcomp , the project name is fastcomp.

To try this out:

  • git clone https://github.com/kripken/emscripten-fastcomp, which is based off of PNaCl's LLVM fork (https://chromium.googlesource.com/native_client/pnacl-llvm)
  • cd tools/
  • git clone https://chromium.googlesource.com/native_client/pnacl-clang clang (Note: into a dir named "clang") (that is clang, which we use unchanged from PNaCl).
  • Build it:
  • cd .. to get back to the root of the llvm checkout
  • mkdir build and then cd build
  • ../configure --enable-optimized --disable-assertions --enable-targets=host,js (note: debug builds might not work, use only release for now, as in this build command)
  • (Alternatively, you can use CMake instead of configure: cmake .. -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD="X86;JSBackend" -DLLVM_INCLUDE_EXAMPLES=OFF -DLLVM_INCLUDE_TESTS=OFF, replace X86 if you are on something else.)
  • make -j 4 (or whatever number of cores you want to use)
  • Set it up in ~/.emscripten (set the path to the llvm checkout + /build/Release/bin as LLVM_ROOT)
  • Use emscripten's incoming branch, not master! And pull the very latest incoming. (Also make sure to pull the latest master from the fastcomp repo.)
  • To use the new compiler, build with EMCC_FAST_COMPILER=1 in the env, for example
   EMCC_FAST_COMPILER=1 ./emcc -O2 tests/hello_world.cpp

Status

As of Dec 19th 2013, basic functionality works, so much that the backend can properly build things like bullet, python, etc. and all of the test suite that should pass, passes. However, anything not basic likely does not work, for example:

  • No legacy GL emulation
  • No indirectbr (branch to an unknown-at-compile-time basic block, done using e.g. gcc extensions to get the address of a block of code in C)
  • Most of the settings.js options (e.g. RESERVED_FUNCTION_POINTERS, FORCE_ALIGNMENT, etc.) have no effect, but some do - if they are mostly toolchain or optimizer options and not compiler options (e.g. OUTLINING_LIMIT). You should receive a compile-time error if you use a setting which is not yet supported, if it has not been missed.
  • Linking of asm.js shared modules (note that normal static linking as used by almost all projects works fine, it is just specifically the options MAIN_MODULE and SIDE_MODULE that do not work)

The new backend is also a work in progress, and focused on correctness for now, not speed. It still beats the old compiler easily on build times, but is written very inefficiently at the moment (for example, passing around lots of std::strings all the time). Once it is more complete and stable, we can move into code cleanups and optimizations. For now, it's too early to say how fast it will end up being (and to set expectations realistically, note that the compiler is not always the slowdown when building - if you link many small files, for example, llvm-link could be the bottleneck, etc.).

The new backend also does not generate optimal code. Right now, its output can be slower than the original compiler. However, over time it should get better than the original one, because we can use LLVM features more directly, for example we should be able to fix some pessimistic alignment decisions the old compiler makes on function pointers, the stack and the heap.

Contributing

Please test fastcomp on your code! Everything should work if it is not mentioned in the list of limitations above. If you see a problem, file a bug.

If you feel like writing code, see the fastcomp label on the issue tracker, to find stuff that needs to be done.

Also helpful is to grep for EMCC_FAST_COMPILER in the test suite, to find tests not yet passing and see what can be done for them.

Backend code structure

The backend is in the repo linked to above, and code is in lib/Target/JSBackend/. The main file is JSBackend.cpp but the the other files in that directory are important too.

There is also the I64 simplification pass, which is currently in lib/Transforms/NaCl/ExpandI64.cpp, but which should move to another directory probably.

Design

We should keep the backend as small and focused as possible. Whereas the old compiler included support for 1,000 various options, in the new backend it should be as straightforward as possible. Various options should be implemented modularly as separate optional JS passes. See for example the outliner pass in the JS optimizer, and the asm linker (emlink.py) which uses asm_module.py to parse and modify compiler output.

For example, the SAFE_HEAP and CHECK_HEAP_ALIGN options check for segmentation faults and misaligned reads and writes. The old compiler included code in the core compiler itself to emit special hooks for those options as needed. In the new one, we should write a modular standalone pass to process the output and add those hooks. This is both simpler and more maintainable. (update: this has been done for SAFE_HEAP.)

Testing

The part of the test suite expected to pass should pass 100%. To run it, do

EMCC_FAST_COMPILER=1 ./tests/runner.py default asm1 asm2

Status of the original compiler

The new LLVM backend will eventually become the default used by emscripten. However, the original compiler will remain supported for quite some time, as it provides some features the new compiler is not intended to provide, like support for non-typed arrays output, etc. (such things are not recommended in general - whatever is commonly recommended will be supported in the new compiler - but there can be rare situations where they make sense).

Clone this wiki locally