Skip to content
Merged
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
235 changes: 235 additions & 0 deletions SetjmpLongjmp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,235 @@
# C setjmp/longjmp in WebAssembly

## Overview

This document describes a convention to implement C setjmp/longjmp via
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that this convention is now the default for LLVM, right? I think this document should say that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's documented in the "Implementations" section.

[WebAssembly exception-handling proposal].

This document also briefly mentions another convention based on JavaScript
exceptions.

[WebAssembly exception-handling proposal]: https://github.com/WebAssembly/exception-handling

## Runtime ABI

### Linear memory structures

This convention uses a few structures on the WebAssembly linear memory.

#### Reserved area in jmp_buf

The first 6 words of C jmp_buf is reserved for the use by the runtime.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a note here making it explicit that the contents of these 6 words are not public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

("words" here are C pointer types specified in the [C ABI].)
It should have large enough alignment to store C pointers.
The actual contents of this area are private to the runtime implementation.

[C ABI]: BasicCABI.md

##### Notes about the size of reserved area in jmp_buf

Emscripten has been using 6 `unsigned long`s. (`unsigned long [6]`)

GCC and Clang uses `intptr_t [5]` for their [setjmp/longjmp builtins].
It isn't relevant right now though, because LLVM's WebAssembly target
doesn't provide these builtins.

[setjmp/longjmp builtins]: https://gcc.gnu.org/onlinedocs/gcc/Nonlocal-Gotos.html

#### __WasmLongjmpArgs

An equivalent of the following structure is used to associate necessary
data to the WebAssembly exception.

```c
struct __WasmLongjmpArgs {
void *env; // a pointer to jmp_buf
int val;
};
```

The lifetime of this structure is rather short. It lives only during a
single longjmp execution.
A runtime can use a part of `jmp_buf` for this structure. It's also ok to use
a separate thread-local storage to place this structure. A runtime without
multi-threading support can simply place this structure in a global variable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this __WasmLongjmpArgs struct exposed as part of the ABI between libc and the compiler? If it isn't, would it make sense to remove this section, or at least add a caveat that this section is non-normative?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a part of the ABI. the compiler-generated code needs to know how to read members of this structure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I misunderstood that part then.

In that case, I wonder if it would make sense to further simplify the ABI, from this:

      pop __WasmLongjmpArgs pointer from the operand stack
      $env = __WasmLongjmpArgs.env
      $val = __WasmLongjmpArgs.val
      $label = $__wasm_setjmp_test($env, $func_invocation_id)
      if ($label == 0) {
         ;; not for us. rethrow.
         call $__wasm_longjmp($env, $val)
      }

to this:

      $args = pop __WasmLongJmpArgs pointer from the operand stack
      $label = $__wasm_setjmp_test($args, $func_invocation_id)

doing the loading of $val and $env, as well as the if ($label == 0) and the __wasm_longjmp call while we're at it, all within the __wasm_setjmp_test call.

That way, we'd have less code inline. Would that make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, at least $val needs to be visible to the catching logic as it's the return value of setjmp()
while we can make __wasm_setjmp_test somehow return $val as well, it seems like the opposite (at least incompatible) direction from multivalue TODO, where we can make $env and $val direct parameters of the exception.
personally i'm ok with either ways.
@aheejin do you have any preference? (asking because i guess the multivalue TODO comment is yours.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @yamt said, val needs to be accessible from the catching code, so I'm not sure what's the point of passing arg to __wasm_setjmp_test and making it return val. Also

if ($label == 0) {
  ;; not for us. rethrow.
  call $__wasm_longjmp($env, $val)
}

this part cannot go into __wasm_setjmp_test, no? It doesn't look like we can save a ton here, regardsless of whether we do multivalue or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this part cannot go into __wasm_setjmp_test, no?

i guess it can. why not?
__wasm_setjmp_test can just call __wasm_longjmp internally. or throw the exception directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah, you're right. Would you like to submit a PR to the LLVM repo doing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah, you're right. Would you like to submit a PR to the LLVM repo doing this?

do you only mean "make __wasm_setjmp_test rethrow"?
or what sunfishcode suggested?
or both?

i added them to the "Future directions" section for now.


### Exception tag

This convention uses a WebAssembly exception to perform a non-local jump
for C `longjmp`.

The exception is created with an exception tag named `__c_longjmp`.
The name is used for both of [static linking](Linking.md) and
[dynamic linking](DynamicLinking.md).
The type of exception tag is `(param i32)`. (Or, `(param i64)` for [memory64])
The parameter is the address of the `__WasmLongjmpArgs` structure on the
linear memory.

[memory64]: https://github.com/WebAssembly/memory64

### Functions

```c
void __wasm_setjmp(jmp_buf env, uint32_t label, void *func_invocation_id);
uint32_t __wasm_setjmp_test(jmp_buf env, void *func_invocation_id);
void __wasm_longjmp(jmp_buf env, int val);
```

`__wasm_setjmp` records the necessary data in the `env` so that it can be
used by `__wasm_longjmp` later.
`label` is a non-zero identifier to distinguish setjmp call-sites within
the function. Note that a C function can contain multiple setjmp() calls.
`func_invocation_id` is the identifier to distinguish invocations of this
C function. Note that, when a C function which calls setjmp() is invoked
recursively, setjmp/longjmp needs to distinguish them.

`__wasm_setjmp_test` tests if the longjmp target belongs to the current
function invocation. if it does, this function returns the `label` value
saved by `__wasm_setjmp`. Otherwise, it returns 0.

`__wasm_longjmp` is similar to C `longjmp`.
If `val` is 0, it's `__wasm_longjmp`'s responsibility to convert it to 1.
It performs a long jump by filling a `__WasmLongjmpArgs` structure and
throwing an exception with its address. The exception is created with
the `__c_longjmp` exception tag.

## Code conversion

The C compiler detects `setjmp` and `longjmp` calls in a program and
converts them into the corresponding WebAssembly exception-handling
instructions and calls to the above mentioned runtime ABI.

### Functions calling setjmp()

On the function entry, the compiler would generate the logic to create
the identifier of this function invocation, typically by performing an
equivalent of `alloca(1)`. Note that the alloca size is not important
because the pointer is merely used as an identifier and never be dereferenced.

Also, the compiler converts C `setjmp` calls to `__wasm_setjmp` calls.

For each setjmp callsite, the compiler allocates non-zero identifier called
"label". The label value passed to `__wasm_setjmp` is recorded by the
runtime and returned by later `__wasm_setjmp_test` when processing a longjmp
to the corresponding jmp_buf.

Also, for code blocks which possibly call `longjmp` directly or indirectly,
the compiler generates instructions to catch and process exceptions with
the `__c_longjmp` exception tag accordingly.

When catching the exception, the compiler-generated logic calls
`__wasm_setjmp_test` to see if the exception is for this invocation
of this function.
If it is, `__wasm_setjmp_test` returns the non-zero label value recorded by
the last `__wasm_setjmp` call for the jmp_buf. The compiler-generated logic
can use the label value to pretend a return from the corresponding setjmp.
Otherwise, `__wasm_setjmp_test` returns 0. In that case, the
compiler-generated logic should rethrow the exception by calling
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably it's also ok to rethrow with delegate/throw_ref?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yes, I think we can rethrow (or throw_ref) it. Not sure whether we need delegate though. But given that it is discouraged to use the current rethrow instruction in LLVM because it is replaced by the new throw_ref, and the new throw_ref has not been implemented in LLVM yet, we can probably try this later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, i meant rethrow, not delegate.

`__wasm_longjmp` so that it can be eventually caught by the right function.

For an example, a C function like this would be converted like
the following pseudo code.
```c
void f(void) {
jmp_buf env;
if (!setjmp(env)) {
might_call_longjmp(env);
}
}
```

```wat
$func_invocation_id = alloca(1)

;; 100 is a label generated by the compiler
call $__wasm_setjmp($env, 100, $func_invocation_id)

block
block (result i32)
try_table (catch $__c_longjmp 0)
call $might_call_longjmp
end
;; might_call_longjmp didn't call longjmp
br 1
end
;; might_call_longjmp called longjmp
pop __WasmLongjmpArgs pointer from the operand stack
$env = __WasmLongjmpArgs.env
$val = __WasmLongjmpArgs.val
$label = $__wasm_setjmp_test($env, $func_invocation_id)
if ($label == 0) {
;; not for us. rethrow.
call $__wasm_longjmp($env, $val)
}
;; ours.
;; somehow jump to the block corresponding to the $label
...
...
end
```

### Longjmp calls

The compiler converts C `longjmp` calls to `__wasm_longjmp` calls.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, maybe we could mention emscripten_longjmp too


## Dynamic-linking consideration

In case of [dynamic-linking], it's the dynamic linker's responsibility
to provide the exception tag for this convention with the name
"env.__c_longjmp". Modules should import the tag so that cross-module
longjmp works.

[dynamic-linking]: DynamicLinking.md

## Emscripten JavaScript-based exceptions

Emscripten has a mode to use JavaScript-based exceptions instead of
WebAssembly exceptions. In that mode, `emscripten_longjmp` function,
which throws a JavaScript exception, is used instead of `__wasm_longjmp`.

```c
void emscripten_longjmp(uintptr_t env, int val);
```

The compiler translates C function calls which possibly ends up with
calling `longjmp` to indirect calls via a JavaScript wrapper which
catches the JavaScript exception.

## Implementations

* LLVM (19 and later) has a pass ([WebAssemblyLowerEmscriptenEHSjLj.cpp])
to perform the convertion mentioned above. It can be enabled with the
`-mllvm -wasm-enable-sjlj` option.

Note: as of writing this, LLVM produces a bit older version of
exception-handling instructions. (`try`, `delegate`, etc)
binaryen has a conversion from the old instructions to the latest
instructions. (`try_table` etc.)

* Emscripten (3.1.57 or later) has the runtime support ([emscripten_setjmp.c])
for the convention documented above.

* wasi-libc has the runtime support ([wasi-libc rt.c]) for the convention
documented above.

[WebAssemblyLowerEmscriptenEHSjLj.cpp]: https://github.com/llvm/llvm-project/blob/70deb7bfe90af91c68454b70683fbe98feaea87d/llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[WebAssemblyLowerEmscriptenEHSjLj.cpp]: https://github.com/llvm/llvm-project/blob/70deb7bfe90af91c68454b70683fbe98feaea87d/llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp
[WebAssemblyLowerEmscriptenEHSjLj.cpp]: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp

Not sure if this needs to be from a specific commit version, given that we are not specifying specific lines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i prefer to always use permalink as the files can be renamed/removed in future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in that case we should fix the link, because it means the doc is pointing to a wrong file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, after all, what this doc refers to is today's version of the file.


[emscripten_setjmp.c]: https://github.com/emscripten-core/emscripten/blob/7d66497d96cdcffa394ad67d87f7118137edf9ab/system/lib/compiler-rt/emscripten_setjmp.c

[wasi-libc rt.c]: https://github.com/WebAssembly/wasi-libc/blob/d03829489904d38c624f6de9983190f1e5e7c9c5/libc-top-half/musl/src/setjmp/wasm32/rt.c

## Future directions

* `__WasmLongjmpArgs` can be replaced with WebAssembly multivalue.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this is just an internal implementation detail, right? If so, would it make sense to omit it in this document?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no. as mentioned above, it's a part of the current ABI. (thus making it use multivalue is unfortunately another ABI change.)


* Or, alternatively, we can make `__wasm_setjmp_test` take the
`__WasmLongjmpArgs` pointer so that we can drop the `__WasmLongjmpArgs`
structure layout from the ABI.

* It might be simpler for the complier-generated catching logic to rethrow
the exception with the `rethrow`/`throw_ref` instruction instead of
calling `__wasm_longjmp`. Or, it might be simpler to make
`__wasm_setjmp_test` rethow the exception internally.

* If/When WebAssembly exception gets more ubiquitous, we might want to move
the runtime to compiler-rt.