Skip to content

Conversation

fgmccabe
Copy link
Collaborator

Now uses blocks to signal event handlers

Renamed to fibers

Now uses blocks to signal event handlers

Renamed to fibers
Added explanation of how to implement async/await

Also did some word smithing
@RossTate
Copy link

The exposition in this is really good! And many of the design changes look good as well. However, there are a few items that I think need to be addressed.

The first item is handle design. I understand that handle blocks were introduced in order to make the fibre.new instruction simpler, but that instruction is where its use is most problematic. I believe in the rest of WebAssembly, the validity of any instruction that references a function only depends on the type of that function, not its definition. However, you've made the requirement that the referenced function has a top-level handle block. What if the function were imported? Validation aside, semantically this is a problem as well. For example, with this design, inlining a direct function call can change the semantics of the program by pulling a handle block to the top level. (Inlining is also problematic for introducing nesting of handle blocks, which by the current semantics would change the semantics of the program.) A design for fibre.new that addresses all these issues without requiring handle blocks or new kinds of "suspendible" functions is one that specifies an event-function table that, for each event, specifies the function to be executed should the fibre be first resumed with that event.

The handle block is also inconsistent with existing WebAssembly design. Sometimes that's fine, but there doesn't seem to be a particular reason for the deviation here. handle essentially sets up a switch table. But whereas the existing instruction for switch tables—br_table—separates the creation of the labels (which is done by block and loop and such) from the specification of the table, handle bundles the two together. If event-switching tables were expected to be reused often, this might be useful for keeping size down and avoiding redundant tables, but the use cases people have given us suggest that event-switching tables are expected to be different on most suspensions even within the same function (and will be rather small anyways). So having suspending functions specify their event-switching tables makes better use of Wasm's existing control constructs and is likely to be more compact.

The second item is return types. At present WebAssembly has no parameterized value types. Adding parameterized value types adds a lot of complexity. There might be occasions that merit it, but this use seems to come with additional complexity and no significant benefits. Returning is just another way for a fibre to transfer control to its parent. While this parameterization avoids the need to check a tag, we've seen from other experiments that that tag check costs nothing. But having a "return" case adds an additional case to many items throughout the design and the spec. For example, at present you can fibre.retire to the parent through an event, but not through returning, so there's an operation that cannot be implemented (without substantial cost) through the existing ones due to this additional case. It also makes fibre.switchto more expensive because you have to dynamically check that the tasks that you're switching between have the same return type. The "hole" that return types seems to fill is what to do when the function(s) used by fibre.new and fibre.spawn return, but I think that hole is better filled by either requiring those functions to not return (i.e. unreachable result) or specing that the program traps if they return. That said, I see there being utility in having suspending instructions specify a distinguished event whose payload is given directly to the following instruction rather than some label. This also cleanly ensures that every suspended fibre is resumable with some event, avoiding some awkward not-resumable-but-also-not-moribund state.

The third item is a corner case: what to do when the fibre being transferred to does not handle the event at hand? I know of some use cases where it's useful for the currently executing fibre to be able to try again with another event. So I would suggest transferring instructions (optionally?) specify a label to transfer control to (within the currently executing fibre) when the event is unhandled (by the target fibre); if no label is specified, then the currently executing fibre traps.

The final item is syntactic: I wonder if we might want to change the names of fibre.resume and fibre.suspend. There are many instructions besides these that resume and suspend fibres, and I worry that it will be confusing to have to regularly clarify whether "resume/suspend" means the instruction or the operation. Off the top of my head, I'd recommend fibre.mount and fibre.dismount as replacements.

Sorry to focus on the negatives; it's just that the positives don't need any changes! (Also, I ran out of time and haven't read over the examples yet.)

@fgmccabe
Copy link
Collaborator Author

Thank you for the rapid review; much appreciated.

Some specific responses:

  1. However, you've made the requirement that the referenced function has a top-level handle block. What if the function were imported?
    This requirement is not a validity requirement; I am proposing reinterpreting the function's top-level block as a handle block. In fact, it amounts to the same requirement as a code that is supposed to be responding to any switch event. There is a dynamic switch to the appropriate event handler for a switch. That switch is entirely determined by the event tag of the switch event; but the validity of that switch is guaranteed by the type signatures. So, a fiber.new function not having the right event block is equivalent to a fiber not having the right event block for a regular resume.

2 > ... pulling a handle block to the top level.
In fact, if a function's top-level block were viewed as being a handle block also, then inlining is going to result in a nested handle block.

3 > ... WebAssembly has no parameterized value types.
Function types are parameterized value types. So are ref types. So, it would seem to me that we are not really doing much new here. The issue with returning from a fiber is primarily one of 'filling a hole'. Requiring fiber function to retire rather than return seems (a) unnecessarily restrictive and (b) is not consistent with any policy for propagating exceptions. Although not formally linked, I would link them because allowing one seems to suggest allowing the other. IMO, propagating exceptions is not advisable; but is already specified in JSPI and I foresee a lot of pressure to allow it.

This is definitely a case of architectural 'effect at a distance'; but I believe it's unavoidable.

I think that we can mull over some of the other suggestions...

@fgmccabe fgmccabe merged commit 78194e9 into main Sep 12, 2022

#### `fiber.suspend` Suspend an active fiber

The `fiber.suspend` instruction takes a fiber as an argument and suspends the fiber. The identified fiber must either be the `active` fiber, or a resume ancestor of the active fiber.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why allow resume ancestors to be suspended? It seems like the simpler thing would be to only allow suspending the active fiber. To suspend a target ancestor, the toolchain would have to arrange for all of its descendents to suspend themselves, but that doesn't seem overly burdensome. This would also give ancestors the opportunity to save their task-local global state (like stack pointers) lazily once they know they will be suspended rather than eagerly once they know they might be suspended, which would be more efficient.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You absolutely do want to be able to suspend a resume ancestor. This shows up, for example, when combining different use cases. E.g., an async function/generator combo. You absolutely don't want to have to search the stack looking for the right place to suspend/resume - dynamic scoping brings unfortunate collisions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrt stack state: for a number of engine-safety related reasons, whenever stacks are switched, all the necessary register state is stored during that switch. The total cost is approximately equivalent to a function call (+ mutex in some cases).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems better to walk the stack (of fibers) when suspending so that stack pointers can be stashed than to have to eagerly stash and restore stack pointers even when there might not be any suspension.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to, if for no other reason than to ensure safety of GC.
However, this is not in any case the major cost for switching stacks. That mutex is.


The `fiber.suspend` instruction takes a fiber as an argument and suspends the fiber. The identified fiber must either be the `active` fiber, or a resume ancestor of the active fiber.

The _root_ ancestor fiber does not have an explicit identifier; and so it may not be suspended.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a problem for integration with JSPI. JSPI is all about suspending the root fiber by calling out to JS that returns a promise. It would be a shame if there were no way to accomplish the same thing without a call out to JS once the spec can properly describe suspension and resumption. In particular, I would hope that stack switching would give us a natural way to generalize the capabilities of JSPI beyond the JS embedder.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not being able to suspend the root fiber is a correctness requirement: otherwise, one could end up in a situation where no forward execution is available.
However, this does not clash with JSPI. In that world, the export call creates a fiber for the export to run in. This is a 'local' root but not the global root (which is typically connected to the browser's event loop)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that that "local" root does not have an identity because from this proposal's POV it is still the root, so it cannot be suspended by this proposal. So the call to promise-returning import would still be necessary. This could be fixed by giving root fibers identity so they can be suspended. Yes, execution would return to the host at that point, but that doesn't seem like a problem. In particular, suspending all Wasm forward progress and returning control to the host is exactly what JSPI does.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, if you are using JSPI, then your local root will have identity and you will be able to suspend it. And, as you pointed out, that will cause the entire app to suspend and 'return' to the browser.


The _root_ ancestor fiber does not have an explicit identifier; and so it may not be suspended.

The fiber that is suspended is marked `suspended`, and the the immediate resume parent of that fiber becomes the active fiber.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably all its resume descendents are marked suspended as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary requirement is that (a) you can only suspend active fibers and (b) you can only resume a fiber that was suspended. How you achieve that is a matter of implementation. One way is to have three states: active, suspended and inuse. The latter are fibers that are between other fibers on the 'fiber' stack.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but if a resume ancestor of the active fiber is suspended, I would expect that all of its resume dependents down to the active fiber would also be marked suspended to become eligible for resumption. Is that right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Only the fiber that actually was suspended can be resumed. The other fibers which are hanging off it may not be resumed.
When a suspended fiber is resumed, what actually happens is that the innermost fiber that was last running gets woken up. Because that is where the suspended fiber was running last.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that innermost fiber that gets resumed may not actually be an eligible target for the fiber.resume instruction? That seems extremely counterintuitive.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make sense if you think about it :)


If the resumed fiber suspended itself, then the event tag associated with that `fiber.suspend` instruction is used to determine which of the available `event` blocks should be entered as part of the switch. The 'nearest' `event` block whose tag is equal to the supplied event is entered. If there is no appropriate `event` block in the execution scope of the fiber being resumed, then the engine _traps_.

#### `fiber.switch` Switch to a different fiber
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is like a return_call except instead of returning it is suspending and instead of calling it is resuming?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fiber.switch is similar to a combination of fiber.suspend + fiber.resume. However, the event involved is never inspected by the fiber's resume parent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just like how the operands to return_call are never inspected by the caller of the return-calling function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return_call has a requirement that the return types of the called function match the return types of the calling function.
However, in the case of a fiber switching between fibers, the types of the event need not be the same as the scheduler's event (or, more accurately, the scheduler does not need to know how to handle the switch event itself.
This can be important for symmetric coroutining and for implementation channels a la go-lang. When a channel variable gets bound, you can directly wake up the partner goroutine if it has previously blocked on the channel. But channels themselves are not necessarily known to the scheduler.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having said all that, IMO, it is not obvious that we want this. There is another 'action at a distance' thing here: the target fiber may throw an exception. And since exceptions are propagated (ugh) that can cause the scheduler to be asked to handle an exception that it was not expecting (or worse, it can handle it incorrectly). The true cost of this risk in not easily quantifiable. IMO, we should not propagate exceptions; but that ship has sailed.


This, in turn, means that a fiber manager may be relieved of the burden of communicating between fibers. I.e., `fiber.switch` supports a symmetric coroutining pattern. However, precisely because the fiber's manager is not made aware of the switch between fibers, it must also be the case that this does not _matter_; in effect, the fiber manager may not directly be aware of any of the fibers that it is managing.

#### `fiber.retire` Retire a fiber
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it's only useful for exceptions, since fibers can always return nonexceptionally without any event involved. What if we made propagating an exception out of a fiber behave the same as doing a fiber.retire with the same tag as the exception? Then we might not need a separate fiber.retire instruction.

(Relatedly, is there any reason not to reuse try-catch rather than introducing handle-event? They seem awfully similar, and reusing try-catch would be the same as throwing an exception when receiving an event for which there is no handler, which does not seems necessarily like a worse choice than trapping. It also seems like it would be useful to have a handle-all analogous to catch-all.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exceptions have a policy attached of automatic propagation of uncaught exceptions. This is not something that we want. Nor is there any equivalent of catchall, delegate etc in the fiber model. But, otherwise, there is some similarity.
It is true that fiber.retire would be useful for exceptions; it is not the only use. It can also be useful when escaping out of deep generators.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is fiber.retire more useful than normal return for escaping deep generators?

It's not clear to me that the policy of trapping on unhandled events is obviously better than throwing an exception on unhandled events. I also certainly anticipate a need to recover gracefully from unexpected events, and in the fullness of time I think we would want exception-throwing versions of these primitives even if we start out with trapping primitives. (Having throwing versions of currently-trapping instructions is a top request in the GC proposal, for instance.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That may be. At the moment, it is less risky and involves less design to trap.


>The reason that we don't recommend allowing exceptions to propagate is that an inapprpriate exception handler may be invoked as a result. This is especially dangerous in the case that the retiring fiber was switched to—with a `fiber.switch` instruction—rather than being resumed.

#### `fiber.retireto` Retire a fiber and directly switch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only difference between this and doing a fiber.switch followed by a normal return is that the fiber is marked moribund and its resources freed sooner. But given that it would be a toolchain bug if a moribund fiber were resumed, the only real benefit here is the eager resource freeing. It seems that this instruction could be replaced with a fiber.switch + fiber.release.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, a key semantic difference is that retireto does not involve the scheduler. Also, it is not generally replaceable by switch+release because that would require that the target fiber 'knows' that the 'sending' fiber is done. Again, that is not always possible or appropriate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that not involving a separate scheduler is (a) something that is a tail_call-like optimization for some languages and (b) imperative when communicating data between sibling fibers. The latter scenario allows unboxed values to be used in more situations.

@fgmccabe fgmccabe deleted the fibers branch September 12, 2022 20:14
@fgmccabe fgmccabe restored the fibers branch September 12, 2022 20:14
RossTate pushed a commit to RossTate/stack-switching that referenced this pull request Sep 28, 2022
@fgmccabe fgmccabe deleted the fibers branch February 15, 2023 18:39
dhil pushed a commit to dhil/wasm-stack-switching that referenced this pull request Apr 12, 2024
dhil pushed a commit that referenced this pull request Aug 2, 2024
* [interpreter] Handle custom sections and annotations

Co-authored-by: Yuri Iozzelli <[email protected]>

* Fix merge conflict

* Fix lexer priorities

* Fix wast.ml

* Oops

* Update wast.ml

---------

Co-authored-by: Andreas Rossberg <[email protected]>
dhil pushed a commit that referenced this pull request Aug 2, 2024
* [interpreter] Handle custom sections and annotations

Co-authored-by: Yuri Iozzelli <[email protected]>

* Fix merge conflict

* Fix lexer priorities

* Fix wast.ml

* Oops

* Update wast.ml

---------

Co-authored-by: Andreas Rossberg <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants