Skip to content
This repository was archived by the owner on Apr 25, 2025. It is now read-only.

Remove rethrow #127

Closed
RossTate opened this issue Sep 18, 2020 · 16 comments
Closed

Remove rethrow #127

RossTate opened this issue Sep 18, 2020 · 16 comments

Comments

@RossTate
Copy link
Contributor

With the new change, every program with rethrow is semantically equivalent to a program without rethrow with minimal changes. In the meanwhile, rethrow is a rather complicated instruction due to its context-sensitivity. It complicates the spec by requiring catch blocks to be tracked on the stack. It complicates implementations by requiring (streaming) engines to keep a copy of the exception's payload on the stack until the end of the catch is reached in case it gets rethrown. So removing rethrow will not lose any functionality and will simplify the spec and improve implementations. It might also make transforming programs easier because there will be no need to preserve the location of rethrow instructions within matching catch blocks (or to reindex them to match changes to nesting catch blocks).

The primary argument for rethrow has involved stack traces. Stack tracing should not be done implicitly (at least in production mode) because it can be quite costly, especially for languages that use exceptions as dynamic control throw rather than just for errors. As for debug mode, there are other ways to trace stacks, including techniques that will work just as well without rethrow.

As an example of exceptions as non-error control flow, there are two ways to implement for each loops (using iterators, not generators, so I'm not going into algebraic effects here). One is to check if the iterator has more elements before each iteration of the loop. The other is to just keep getting more elements from the iterator until it throws a "no more elements" exception. Provided throwing an exception is cheap, the latter can outperform the former even in very small loops because it eliminates an interface-method call in each iteration.

This latter implementation of for each illustrates how to effectively use exceptional control flow in library and language design. But its performance is very sensitive to the time it takes to throw exceptions. As an example, Section 9.2 of this paper found disabling stack tracing in the JVM made one of its exception-intensive libraries implementing regex parsing 6x faster (it unfortunately only measures one library).

Hopefully this 6x difference illustrates that, debug-mode aside, if we want people targeting WebAssembly to be able to reasonably account for performance tradeoffs in devising their compilation strategy, then WebAssembly engines need to be consistent about whether/how they perform stack tracing. And if we want this proposal to be a good target for exception-intensive programs/languages, then engines should not be implicitly performing stack tracing. If they're not, then rethrow serves no utility.

@tlively
Copy link
Member

tlively commented Sep 18, 2020

@backes how willing would V8 be to not attach stack traces to WebAssembly exceptions in the name of improving performance? Are implicit stack traces an important part of any debugging story? Would V8 be willing to remove implicit stack traces in the future once languages can use two-phase exceptions and collect their own stack traces? Concrete implementer guidance on this would be very helpful.

With the new change, every program with rethrow is semantically equivalent to a program without rethrow with minimal changes.

There is broad agreement on neither this semantic equivalence assertion nor this minimal changes assertion, so it would be good to dive into those more.

One iffy situation we came up with is a program that conditionally swallows a foreign (e.g. JS) exception. Since it is a foreign exception, there would be no way to extract its payload, put the payload back on the stack, and throw it again. With rethrow, this is not a problem because the program can have a conditional rethrow instruction in a catch_all block. Without rethrow, the program could not use catch_all for this, so it would have to use unwind and somehow conditionally abort the unwind. We haven't (afaik) discussed or agreed on a mechanism for aborting unwinds, so unless there's something clever that I'm missing, I think this example disproves the semantic equivalence assertion.

@RossTate
Copy link
Contributor Author

Great questions, @tlively!

Without rethrow, the program could not use catch_all for this, so it would have to use unwind and somehow conditionally abort the unwind.

You can simply (conditionally) branch out of an unwind. We will need to support such functionality because finally clauses can return or break in many languages (where it indeed effectively aborts the unwinding process).

@aheejin
Copy link
Member

aheejin commented Sep 18, 2020

I said all these in #126 but I'll repeat.

  1. We don't require stack traces to VMs. We are just providing an option for some of them. VMs can choose not to do this for speed, or they can choose another method if they want. But removing rethrow is preventing some VMs that want to embed stack traces in exceptions, like Java and JavaScript, from doing that, effectively removing one viable option. By removing rethrow, you are effectively forcing VMs to use your specific methods (using global state?) to do stack traces.

  2. Stack traces are mostly for debug builds. Debug builds are, by definition, sacrificing speed for debuggability. VMs can choose not to embed stack traces in release mode.

  3. Without rethrow, we should rearrange the exception arguments in stack in the same order to rethrow it, which requires more instructions and bookeeping.

  4. That there may exist a way to collect stack traces other than embedding stack traces in exceptions is not a strong enough argument to remove an existing instruction being used. As I said, we don't even mandate it.

  5. We made a new spec two days ago, and I even scheduled multiple individual meetings with you to make sure you agree before the CG meeting. While constant language design experiments can be a fun task for you, we have real customers who have been waiting for years, and we have limited time for bikeshedding after we make an important decision, unless we discover a very serious problem we haven't thought about. And I don't think "embedding stack traces is slow" is that argument.


Added later:
6. catch_all does not extract values, so the only way to rethrow within catch_all is using rethrow.

@RossTate
Copy link
Contributor Author

But removing rethrow is preventing some VMs that want to embed stack traces in exceptions, like Java and JavaScript, from doing that, effectively removing one viable option.

Java, C#, and Python all need stack traces to be both in terms of surface-level code (not wasm code) and to be explicit in the payload and directly accessible. The rethrow instruction in no way addresses their needs.

Stack traces are mostly for debug builds.

Debugging is generally supported in two parts: support from the execution environment, and support from the application. The latter involves compiling the application differently. The former generally either interacts with annotations in the differently-compiled application or employs guesswork when those hooks are not present.

One way applications can be compiled differently to give better stack traces in debug build is to make the stack trace explicit in the payload. (You can even always have an exnref stack trace in the payload and just have it be null in production mode so that most of the compilation is unphased by the mode.)

As for debug support from engines, when running an application in debug mode an engine can either give less stack trace information (e.g. only the stack trace from the last handler/unwinder, which is what some systems do), or maintain a stack trace in a global rather than as part of the payload, with heuristics as to when to clear vs. update the stack trace (which can easily be done to support catch and throw patterns, without needing rethrow).

So rethrow is not necessary for getting good stack traces in debug mode either.

we have real customers who have been waiting for years

rethrow is a complex addition to the spec. If it's not necessary, not adding will make for faster turnaround. (Yes, I understand that there is an existing instruction that happens to have the same name, but the two are completely different in how they can be used. The uses of the old rethrow will either be eliminated, made unnecessary by unwind, or easily replaced with throw $__cpp_exception, which is not restricted to be only used in certain locations.) So spending a bit of time to potentially save a lot of time seems like a worthwhile investment.

@aheejin
Copy link
Member

aheejin commented Sep 18, 2020

rethrow is a complex addition to the spec. If it's not necessary, not adding will make for faster turnaround. (Yes, I understand that there is an existing instruction that happens to have the same name, but the two are completely different in how they can be used. The uses of the old rethrow will either be eliminated, made unnecessary by unwind, or easily replaced with throw $__cpp_exception, which is not restricted to be only used in certain locations.) So spending a bit of time to potentially save a lot of time seems like a worthwhile investment.

Not sure what you're talking about, but you are trying to remove existing instruction, not adding a new one. Also don't understand what you mean by old rethrow and new rethrow; there is one rethrow. (I used the word resuming in the previous issue, but this distinction becomes meaningful only when we introduce two-phase unwinding, which we aren't gonna do right now)

For others, I don't think I need to repeat myself again what I wrote here: #127 (comment)

@aheejin
Copy link
Member

aheejin commented Sep 18, 2020

Without rethrow, the program could not use catch_all for this, so it would have to use unwind and somehow conditionally abort the unwind.

You can simply (conditionally) branch out of an unwind. We will need to support such functionality because finally clauses can return or break in many languages (where it indeed effectively aborts the unwinding process).

I don't think we can use unwind as replacement of catch_all. When we have two-phase unwinding, catch_all is found by the first phase search and this stops the first phase and starts the second phase. But unwind is skipped by the first phase search, preserving the stack. If we use unwind as a replacement of catch_all, this is not going to be discovered by the first phase search.

@dschuff
Copy link
Member

dschuff commented Sep 18, 2020

  1. ☝️ what @aheejin said;
    I had thought that we might not want to allow branching out of unwind in that way, since (once we extend to 2-phase) it could allow a case where you'd run the search phase up to some point in the stack and discover a catch (or not), but then an unwind halfway through the second phase could abort the unwinding process before it reaches the destination discovered by the first phase. This could happen regardless of whether there is also a catch with a filter for that frame or not. In other words, the filtering mechanism could say that this frame doesn't catch the exception but the the unwind effectively catches it anyway by aborting the unwind; or there could be no filter at all (which I guess is the case @aheejin mentioned).
    I suppose that might not be considered unsound, but I don't know what the use case for it would be.

  2. backing up a bit; as @rossberg said, there is a use case for rethrowing a foreign exception, which you said is subsumed by try/unwind. How would that work? We have 3 use cases:

  • a) An exception cannot be caught by this frame, but destructors are run. This is now unwind, allowing the block to end and unwinding to continue. That would be unchanged in a 2-phase world.
  • b) A known exception may be caught: currently this is catch with a tag, and then a conditional rethrow. For the purposes of this discussion (i.e. ignoring stack traces and performance), it could be catch with tag and then a conditional fresh throw of the values. With 2-phase, the condition would be moved to some filtering mechanism.
  • c) A foreign exception may be caught. Currently if it is caught, this is catch_all and a regular exit from the block e.g. via branch. If it is not caught, it is a rethrow. Are you proposing that the catching case could be unwind with a branch exit, and the non-catching case could be unwind with a non-branch exit, thus continuing unwinding? This would mean that if we do a straightforward extension to 2-phase, the first phase would be unable to tell the difference between a) and c).
    Actually now that I have written this, it's maybe just a more detailed version of what @aheejin said. But I would like to clarify what you are actually suggesting.
  1. There's also still the use case @rossberg gave in Proposed spec changes + rethrow question #125 (comment)

@tlively
Copy link
Member

tlively commented Sep 19, 2020

Please excuse the length of this post. I mean it to be exhaustive, but I try to keep each point brief.


Collecting what I believe to be @RossTate's main arguments against rethrow and my own responses to them:

  1. rethrow is redundant - "Every program with rethrow is semantically equivalent to a program without rethrow with minimal changes."

    • So far we have no counterexample for this if we discount differences in unspecified semantics like changes to stack traces a VM may construct (and also assume reasonable things such as that branching out of an unwind halts the unwinding.)
  2. "It complicates the spec by requiring catch blocks to be tracked on the stack"

    • I'm not very concerned about spec complications on their own, as opposed to implementation or future compatibility concerns.
  3. It requires streaming engines to keep a copy of the exception's payload on the stack until the end of the catch is reached in case it gets rethrown.

    • I can't speak to how annoying this is to implement, but it doesn't really sound that bad. I could even imaging it being simpler to treat rethrowing and non-rethrowing catches uniformly. Streaming engine codegen quality isn't so important, either.
  4. It might also make transforming programs easier because there will be no need to preserve the location of rethrow instructions within matching catch blocks (or to reindex them to match changes to nesting catch blocks).

    • For Binaryen at least, this won't make a difference. In general this sounds similar to the problem of keeping track of indices on branch instructions, which everyone has to deal with anyways.
  5. Not adding rethrow will improve turnaround time on implementations.

    • I don't think the additional time it will take to update rethrow semantics will be very significant compared to the overall time it will take to update the rest of the EH semantics, at least for tools. I expect that would be true of engines as well.

On the other side, collecting what I believe to be the main arguments for keeping rethrow and my own responses to them:

  1. Rethrow is not redundant.

    • So far we have no example that demonstrates this.
  2. Some languages may want to use rethrow to propagate auxiliary information along with their exceptions.

    • So far we don't have any specific languages that would want this. Also, this argument is incompatible with the fact that there is no specified auxiliary information and engines can choose not to have any auxiliary information at all.
  3. Having both throw and rethrow offers more options to language implementors.

    • Strictly true, but this would be more compelling if we had a specific example where rethrow is much simpler to emit than the corresponding code without rethrow. So far I haven't seen that it takes anything more than a branch, putting any extracted arguments back on the stack, and a throw to emulate rethrow's specified semantics, and that doesn't seem too complicated.
  4. Putting the extracted exception arguments back on the stack to be rethrown might not be easy.

    • I don't see how this could be complicated, given the the compiler must know what arguments it extracted and where it put them if it is going to use them for anything.
  5. rethrow is better for code size than the equivalent patterns that would be needed without rethrow.

    • Strictly true, but likely to be negligible.
  6. rethrow helps engines provide good debugging information by making it trivial to preserve the original call site for rethrown exceptions.

    • In the long run, this doesn't seem very important because two-phase exceptions and other debugging initiatives will provide even better debugging experiences that any stack trace the engine implicitly constructs. It would be great to get implementor feedback (cc @backes) about whether Web engines would be willing to not collect stack traces in the short or long term to improve performance, or if they feel a need to always be able to provide an internal stack trace. If they plan to provide stack traces, it would also be good to get implementor feedback on how much they would need to depend on rethrow to provide useful stack traces.

Overall, the most compelling argument by far against keeping rethrow is that it is redundant. I suggest we double down on either convincing ourselves of that or finding a counterexample. In addition, I suggest we ask ourselves whether removing redundancy (and to a lesser extent simplifying the spec) is a valuable enough upside that we should continue to spend time trying to reach consensus on it. We shouldn't let the perfect be the enemy of the good.

The most compelling argument for keeping rethrow is that it may provide real value to users by allowing engines to produce better stack traces. But it is hard to evaluate the strength of this argument without engine implementor feedback on how necessary rethrow is for generating good stack traces and how likely it is that Web engines will forgo auxiliary stack traces entirely. No matter what engines do, though, this may not be important in the long run as better debugging stories are implemented.

@aheejin
Copy link
Member

aheejin commented Sep 19, 2020

@tlively Before I make comments on each of your argument, have you checked #127 (comment)? Without rethrow, we don't even have a way to rethrow within catch_all.

@RossTate said we can use unwind for this but I don't think that's doable because of the reason I said in the comment. What @RossTate says is basically equivalent to remove catch_all and replace it with unwind as well, but they have different semantics and are not interchangeable with each other.

Before we argue on everything else, I think this alone makes removing rethrow a non-starter.

@tlively
Copy link
Member

tlively commented Sep 19, 2020

@aheejin, good point, I should have addressed that specifically.

The program we are talking about wants to conditionally swallow a foreign (e.g. JS) exception. With rethrow, it could look like this:

(try
  (call $js_throws)
 catch_all
  (if (eqz (... condition ...))
    (... run destructors ...)
    (rethrow) ;; propagates foreign exception
  )
  ;; swallows foreign exception
)

Without rethrow, we've proposed that this program would do the same thing:

(try
  (call $js_throws)
 unwind
  (br_if 0 (... condition ...)) ;; swallows foreign exception
  (... run destructors ...)
  ;; reaching end of unwind propagates foreign exception
)

In the future when we add two-phase unwinding (with filter functions syntactically attached to catches for simplicity), we would want to implement it like this:

(try
  (call $js_throws)
 catch_all $condition ;; same condition, now as a filter function
  (nop) ;; swallows foreign exception
 unwind
  (... run destructors ...)
)

There's no rethrow necessary because whether or not the exception is propagated is determined solely by the filter function during the first phase. I anticipate objections that catch_all shouldn't take a filter function, but my response to that would be that we haven't spec'd two-phase exceptions yet and it doesn't seem unreasonable that catch_all could take a filter function.

So I think this demonstrates that rethrow is not strictly necessary (at least for this case), but this also demonstrates that if we remove rethrow, we will have to make a nontrivial change to codegen to support two-phase exceptions in the future. If we keep rethrow, we could adopt the structure of the two-phase case now and simplify future changes:

(try
  (call $js_throws)
 catch_all ;; no syntactic filter function yet
  (if (call $condition) ;; but we can still call a filter function manually
    (nop) ;; swallows foreign exception
   else
    (rethrow) ;; propagate foreign exception
  )
 unwind
  (... run destructors ...)
)

@RossTate, wdyt? The ability to adopt the code structure of two-phase exception handling now and simplify future upgrades from single-phase to two-phase unwinding seems to me to be well worth the redundancy of keeping rethrow.

@aheejin
Copy link
Member

aheejin commented Sep 19, 2020

@tlively I'm not sure if we can make code structure as you suggested; I commented inline.

The program we are talking about wants to conditionally swallow a foreign (e.g. JS) exception. With rethrow, it could look like this:

(try
  (call $js_throws)
 catch_all
  (if (eqz (... condition ...))
    (... run destructors ...)
    (rethrow) ;; propagates foreign exception
  )
  ;; swallows foreign exception
)

Is this the new spec (as of this week's CG meeting)? In the new spec, we put destructors not in catch_all but in unwind. In single phase they do the same thing, but we would put destructors in unwind for preparation anyway.

In the future when we add two-phase unwinding (with filter functions syntactically attached to catches for simplicity), we would want to implement it like this:

(try
  (call $js_throws)
 catch_all $condition ;; same condition, now as a filter function
  (nop) ;; swallows foreign exception
 unwind
  (... run destructors ...)
)

There's no rethrow necessary because whether or not the exception is propagated is determined solely by the filter function during the first phase. I anticipate objections that catch_all shouldn't take a filter function, but my response to that would be that we haven't spec'd two-phase exceptions yet and it doesn't seem unreasonable that catch_all could take a filter function.

  • A try does not take both catch/catch_all and unwind.
  • Whether catch_all should have a filter aside, I'm not sure if we can freely extract a condition within a catch/catch_all block as a filter function. In two-phase unwinding, filter functions are run in the first phase, whereas catch bodies run after we're done with all search and enter a catch body. This can change the order of executions and thus semantics. (We haven't spec'ed two-phase yet, but this is kind of the common definition of two-phase)
  • Catching an exception and rethrowing do not occur exclusively; it may not be as simple as swallowing vs. rethrowing. You may want to catch a foreign exception, do something (e.g. print some message), and then rethrow. We can't split only the rethrowing event into unwind.
  • Also as you pointed out, extracting some random condition as a function is a nontrivial transformation.

@tlively
Copy link
Member

tlively commented Sep 20, 2020

Is this the new spec (as of this week's CG meeting)? In the new spec, we put destructors not in catch_all but in unwind. In single phase they do the same thing, but we would put destructors in unwind for preparation anyway.

  • A try does not take both catch/catch_all and unwind.

Yes, this is with the new spec. I'm not actually sure how unwind is meant to syntactically relate to catch and catch_all, so letting a single try have both was some guesswork on my part, based on the observation that destructors might still need to be run if the exception is caught and rethrown. Can you share how these example programs would properly be written?

  • Whether catch_all should have a filter aside, I'm not sure if we can freely extract a condition within a catch/catch_all block as a filter function.
  • Also as you pointed out, extracting some random condition as a function is a nontrivial transformation.

I'm not arguing that it's possible to transform arbitrary single-phase programs into equivalent two-phase programs by extracting filter conditions. I'm arguing that if the toolchain knows what filter function it wants to run, then if we have rethrow, both single-phase and two-phase handling can have very similar codegen. In particular, the single-phase structure just requires an extra if-else wrapping the catch body with the filter as the condition and rethrow as the else body.

  • Catching an exception and rethrowing do not occur exclusively; it may not be as simple as swallowing vs. rethrowing. You may want to catch a foreign exception, do something (e.g. print some message), and then rethrow. We can't split only the rethrowing event into unwind.

Right, my example programs were specific to the behavior of swallowing the exception without doing anything else, but they can be generalized by putting arbitrary work before the ;; swallows foreign exception lines.

@aheejin
Copy link
Member

aheejin commented Sep 20, 2020

@tlively

I'm not sure if I understand what your point is.. Also I'm not sure what your (and Ross's) definition for 'redundancy' is.

I'm not arguing that it's possible to transform arbitrary single-phase programs into equivalent two-phase programs by extracting filter conditions. I'm arguing that if the toolchain knows what filter function it wants to run, then if we have rethrow, both single-phase and two-phase handling can have very similar codegen. In particular, the single-phase structure just requires an extra if-else wrapping the catch body with the filter as the condition and rethrow as the else body.

I'm not sure if I understand. I didn't think we were talking about translating arbitrary single-phase program into an equivalent two-phase program; I thought we're talking about if it's possible to translate arbitrary program with rethrow into an equivalent program without rethrow, in a way that works both in single and two-phase. And I'm arguing that it is not feasible.

  1. As I said, catch_all and unwind have different semantics in two-phase, so we can't use them interchangeably. Extracting arbitrary conditions into filter functions, whether it's doable or not aside, wouldn't have the same semantics as the original program, because of the execution order. Filters run in the first phase, while user handlers run after both first/second phase search.

  2. Even if we completely ignore this two-phase semantics and assume there's only single phase in the world, a catch body may not be as simple as what you suggest; it can be a combination of any number of conditions and any number of rethrows and random user code in between.

try
  ...
catch_all
  if (some condition)
    if (some other condition)
      rethrow
    else
      if (third condition)
        break
     else
       rethrow
 ...
end

If someone argues we can remove rethrow because we can go under some complex transformations, I wouldn't describe that instruction as redundant. Wasm is not supposed to be a canonical set of instructions with behaviors that cannot be composed out of other instructions with whatever cost. But I'm worried that the point of the discussion gets focused on whether this complex transformation is possible or not; as I said in 1, catch_all and unwind have different semantics in two-phase to begin with.

@tlively
Copy link
Member

tlively commented Sep 20, 2020

@aheejin, I think we should chat offline about some of these points (on Monday) to make sure we're on the same page, but I don't think we're really disagreeing with each other either, so I don't want to spend a whole lot of space on this thread going back and forth. It would still be helpful if you could share how the example programs would be written with the correct unwind syntax, though.

@RossTate
Copy link
Contributor Author

@aheejin and @dschuff bring up some good concerns regarding catch_all, and in particular how it will extend to two-phase exception handling. I too have been wondering how catch_all will extend to more expressive exception systems. I've been digging through research and languages that have already explored this space, and it unfortunately does not look simple. I've opened #128 to focus more specifically on catch_all. #128 does discuss rethrow within catch_all, among other things, and as part of that illustrates how syntactic changes can have semantic effects in the long run, and why rethrow might be undesirable for reasons unrelated to stack tracing.

Given the new issue focused on catch_all, I'd prefer to focus this issue from here on out on removing rethrow from catch. The last comment on that specific topic seems to be this one by @tlively, which I found to be a great summary of that more specific topic. It also asks for feedback from implementers regarding stack traces, which I too would be very interested in hearing.

(@dschuff, your item 3 here is addressed by the fact that surface-language's catch would translate to a wasm catch $exn, with a specific $exn, and so the surface language's rethrow could translate to a throw $exn, though possibly with a different payload depending on the surface language's semantics for rethrowing.)

@tlively
Copy link
Member

tlively commented Sep 21, 2020

I'd prefer to focus this issue from here on out on removing rethrow from catch

What's the benefit of removing rethrow from catch if we're keeping it for catch_all?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants