CFG traversal for the new EH spec #3494

aheejin · 2021-01-15T10:19:28Z

This updates CFG traversal to match the new spec. Previously there was
only a single catch block that caught all exceptions, so all throwing
instructions needed to have a link to its innermost catch BB. But now we
can have multiple catches per try, requiring all throwing instrutions to
have an edge to all of those innermost catch BBs. Furthermore, if there
are only catches and not a catch_all in a try, throwing instructions
can further unwind to outer catches until they find a catch_all.
unwindCatchStack and unwindExprStack are necessary to track and make
correct links between throwing instructions and their unwind destination
BBs.

processCatchStack is used to remember the catch BBs currently being
processed, so that after processing all of them, we can make a link from
each of those catch's last block to the continuation block after the
try-catch.

RSE test cases are updated because they use the CFG traversal. The tests
there mainly test that if all possible CFG edge to a local.set sets
the same value to a local, the local.set is redundant and thus can be
removed.

This updates CFG traversal to match the new spec. Previously there was only a single `catch` block that caught all exceptions, so all throwing instructions needed to have a link to its innermost catch BB. But now we can have multiple catches per try, requiring all throwing instrutions to have an edge to all of those innermost catch BBs. Furthermore, if there are only `catch`es and not a `catch_all` in a try, throwing instructions can further unwind to outer catches until they find a `catch_all`. `unwindCatchStack` and `unwindExprStack` are necessary to track and make correct links between throwing instructions and their unwind destination BBs. `processCatchStack` is used to remember the catch BBs currently being processed, so that after processing all of them, we can make a link from each of those catch's last block to the continuation block after the try-catch. RSE test cases are updated because they use the CFG traversal. The tests there mainly test that if all possible CFG edge to a `local.set` sets the same value to a local, the `local.set` is redundant and thus can be removed.

src/cfg/cfg-traversal.h

kripken · 2021-01-15T16:38:41Z

src/cfg/cfg-traversal.h

+  std::vector<Expression*> unwindExprStack;
+  // Contains the stack of the first blocks of catches currently being processed
+  // in the beginning. Will contain the last blocks of catches after being
+  // processed.


this comment is a little unclear to me. What does "beginning" mean?

I tried to explain this in #3494 (comment). So this vector stores catch start BBs in doStartTry so that each doStartCatch can retrieve its start BB. After processing each catch doEndCatch stores each catch's end BB so that doEndTry can access it.

If we decide to keep a single vector for this, how about

// A stack for each try, where each entry is a list of blocks, one for // each catch, used during processing. We start by assigning the // start blocks to here, and then read those at the appropriate time; // when we finish a catch we write to here the end block, so that when // we finish with them all we can connect the ends to the outside. In // principle two vectors could be used, but their usage does not overlap // in time, and this is more efficient.

Thanks! Done.

src/cfg/cfg-traversal.h

kripken · 2021-01-15T16:55:34Z

src/cfg/cfg-traversal.h

+  }
+
+  static void doEndCatch(SubType* self, Expression** currp, Index i) {
+    self->processCatchStack.back()[i] = self->currBasicBlock;
  }


I don't follow what these two methods do. The start assigns the current basic block, and the end does the reverse assignment - is it not redundant? If not, what is it writing over?

When there's N catches, doStartTry creates N BBs, each of which will serve as the start BB for each catch, and store them in unwindCatchBlock, so that any throwing instructions within the try body can create a link to them.

In doStartCatches, we move those N catch start BBs to processCatchBlock. We need this vector because when we enter each catch in doStartCatch, we should be able to retrieve the start BB created for that catch. But we can't leave those catch start BBs in unwindCatchBlock anymore because that stack is for processing unwinding, the process I described above (= any throwing instruction needs a link to every catch start BB in unwindCatchBlock.back()).

In doEndCatch, we save the end BB of each catch to processCatchStack. They are used in doEndTry that we create a link from each of those end catch BB to the continuation BB after the try-catch. Reusing processCatchStack is just to save some space, and we can use a separate vector for this; let me know if you think this is confusing and it'd be better to use separate vectors.

I'm not sure if I described this well; it was hard to describe in the comments too. If you have any suggestions to improve the comments (or the code) I'd appreciate that.

Thanks, I think I'm starting to get it.

Separate vectors is another option. Not sure it's worth the overhead, as I'm not sure how many catches we expect to see in practice...?

If we keep the same vector as now, I think comments in these two methods could be helpful, like

// Get the block that starts this catch // We are done with this catch; set the block that ends it

Even though they don't add much information, they increase the chance of it making sense when reading just those two methods, I think.

Co-authored-by: Alon Zakai <[email protected]>

aheejin · 2021-01-15T20:13:58Z

src/cfg/cfg-traversal.h

+          self->pushTask(doEndCatchI, currp);
+          self->pushTask(SubType::scan, &catchBodies[i]);
+          auto doStartCatchI = std::bind(doStartCatch, _1, _2, i);
+          self->pushTask(doStartCatchI, currp);


Asan says this line causes a memory leak, but I'm not very sure why... Do you have a guess?

Very odd, no idea... code looks 100% valid to me.

I think I figured out what's going on. Surprisingly std::function's constructor does some dynamic allocation in some cases. (It seems this behavior is dropped in C++17, but we are using C++14.) And our SmallVector does not call destructors of objects on pop_back() of clear() when they are stored in fixed part. (flexible part is fine because it's using std::vector.) It was fine until now because what we've stored in SmallVector, such as Expressions, are all managed by a separate allocation arena.

Fixing this does not seem to be trivial. We can't call destructors on all values (there are primitive types), so SmallVector has to be somewhat complicated, and I'm not sure that's worth the effort. And first of all, given that std::function can cause a dynamic allocation, I'm not sure if we should use it in the part that can be performance critical. The reason I changed

typedef void (*TaskFunc)(SubType*, Expression**);

to

using TaskFunc = std::function<void(SubType*, Expression**)>;

was the result of std::bind can be passed as a std::function but not a function pointer. The reason I used std::bind and not a lambda was a lambda capturing variables cannot be passed as a function argument. (Here we need to capture i)

Maybe we should just use lambdas and make i a class variable or something, which seems less cleaner, but maybe we can't help..

Oh wow, good find... C++ is annoying sometimes.

Yes, lambdas seem best for now, I can't think of a better option.

I changed it to lambdas: #3494 (comment)
I was a little confused. I thought capturing lambdas couldn't be passed as a function argument but that turned out to be only the case when we use the function pointer as TaskFunc. After we changed it to std::function it works fine. Also, not sure why, but the asan memory leak error does not appear in the CG anymore 🤷🏻 (Was that linked to a condition related to std::bind? No idea) So I guess we will go with this.

About possible performance issues, I ran all wasm-opt tests with 1. TaskFunc being a function pointer and 2. TaskFunc being a std::function, in release build. The performance difference doesn't seem to be very concerning. WDYT? All numbers are measured three times but the first result was thrown out of concerns that it might be affected by the cache warming up.

TaskFunc is a function pointer

real 0m56.546s user 0m46.372s sys 1m20.068s real 0m57.272s user 0m46.733s sys 1m21.784s

TaskFunc is a std::function

real 0m58.454s user 0m47.954s sys 1m22.177s real 0m57.924s user 0m47.401s sys 1m22.224s

kripken · 2021-01-15T21:37:51Z

src/cfg/cfg-traversal.h

+  }
+
+  static void doEndCatch(SubType* self, Expression** currp, Index i) {
+    self->processCatchStack.back()[i] = self->currBasicBlock;
  }


Thanks, I think I'm starting to get it.

Separate vectors is another option. Not sure it's worth the overhead, as I'm not sure how many catches we expect to see in practice...?

If we keep the same vector as now, I think comments in these two methods could be helpful, like

// Get the block that starts this catch // We are done with this catch; set the block that ends it

Even though they don't add much information, they increase the chance of it making sense when reading just those two methods, I think.

kripken · 2021-01-15T21:41:46Z

src/cfg/cfg-traversal.h

+  std::vector<Expression*> unwindExprStack;
+  // Contains the stack of the first blocks of catches currently being processed
+  // in the beginning. Will contain the last blocks of catches after being
+  // processed.


If we decide to keep a single vector for this, how about

// A stack for each try, where each entry is a list of blocks, one for // each catch, used during processing. We start by assigning the // start blocks to here, and then read those at the appropriate time; // when we finish a catch we write to here the end block, so that when // we finish with them all we can connect the ends to the outside. In // principle two vectors could be used, but their usage does not overlap // in time, and this is more efficient.

kripken · 2021-01-15T21:42:50Z

src/cfg/cfg-traversal.h

@@ -205,47 +211,79 @@ struct CFGWalker : public ControlFlowWalker<SubType, VisitorType> {
    self->startUnreachableBlock();
  }

-  static void doEndCall(SubType* self, Expression** currp) {
+  static void doEndThrowingInst(SubType* self, Expression** currp) {
    // Every call can possibly throw, but we don't end the current basic block


call is no longer accurate throughout this comment

Changed to

Even if the instruction can possibly throw, we don't end the current basic block unless ...

kripken · 2021-01-15T21:47:38Z

src/cfg/cfg-traversal.h

+          self->pushTask(doEndCatchI, currp);
+          self->pushTask(SubType::scan, &catchBodies[i]);
+          auto doStartCatchI = std::bind(doStartCatch, _1, _2, i);
+          self->pushTask(doStartCatchI, currp);


Very odd, no idea... code looks 100% valid to me.

tlively

Looks good overall! I think @kripken's ideas for the comments are good, too.

tlively · 2021-01-15T22:17:32Z

src/cfg/cfg-traversal.h

+      // thrown can be caught by any of its outer catch block. And if that outer
+      // try-catch also does not have a catch_all, this continues until we
+      // encounter try-catch_all. Create a link to all those possible catch
+      // unwind destinations.


Is it necessary to keep track of event types here so that a link is only created to the innermost catch block that handles each event type? AFAIK that would be more precise, but I don't know how precise this CFG needs to be. If it's not necessary now, it might be worth mentioning in a TODO if it would help with any optimizations later.

I think we can do something like that for throws, but not calls, because we don't know what event type will be thrown from calls. And given that the vast majority of throwing instructions will be calls, I'm not sure how much impact that can have, but I'll add a TODO here:

// TODO This can be more precise for `throw`s if we compare event types // and create links to outer catch BBs only when the exception is not // caught.

I think this applies for calls, too. For example:

try try call $foo catch $e1 ... catch $e2 ... end catch $e1 ... catch $e3 ... end

Here the call to $foo can go to either of the inner catch blocks or the outer catch of $e3, but it will never arrive directly in the outer catch of $e1.

Thanks, that can be a good optimization opportunity. Do you mind if I mark that as a TODO for now?

Added the following TODO comment:

// TODO This can also be more precise if we analyze the structure of nested // try-catches. For example, in the example below, 'call $foo' doesn't need // a link to the BB of outer 'catch $e1', because if the exception thrown by // the call is of event $e1, it would've already been caught by the inner // 'catch $e1'. Optimize these cases later. // try // try // call $foo // catch $e1 // ... // catch $e2 // ... // end // catch $e1 // ... // catch $e3 // ... // end

Sounds good!

tlively · 2021-01-15T22:36:46Z

src/cfg/cfg-traversal.h

+        auto& catchBodies = curr->cast<Try>()->catchBodies;
+        using namespace std::placeholders;
+        for (Index i = 0; i < catchBodies.size(); i++) {
+          auto doEndCatchI = std::bind(doEndCatch, _1, _2, i);


Lambdas would be more readable than std::bind IMO.

I agree, but lambdas capturing variables cannot be passed as a function argument. (Here we need to capture i) Do you know any way around it?

Would capture-by-copy using [i](...) { doEndCatch(..., i); } work?

That works, thanks. I was a little confused while experimenting with different settings; what didn't allow capturing lambdas to be passed as a function argument is the current TaskFunc, which is a function pointer. If we change TaskFunc to std::function, I mean, from

typedef void (*TaskFunc)(SubType*, Expression**);

to

using TaskFunc = std::function<void(SubType*, Expression**)>;

This seems to be able to take capturing lambdas without problem.

kripken

Interesting std::function does not affect performance. That's better than I expected. But I guess even if it does do some extra check, it's negligible compared to the cost of the indirect call itself.

`Walker::TaskFunc` has changed from a function pointer to `std::function` in WebAssembly#3494, mainly to make the EH support for `CFGWalker` easier. We didn't notice much performance difference then, but it was recently reported that it creased binaryen.js code size and performance. This changes `Walker::TaskFunc` back to a function pointer and does a little more work to manage catch index in `CFGWalker` side. Hopefully fixes WebAssembly#3857.

`Walker::TaskFunc` has changed from a function pointer to `std::function` in #3494, mainly to make the EH support for `CFGWalker` easier. We didn't notice much performance difference then, but it was recently reported that it creased binaryen.js code size and performance. This changes `Walker::TaskFunc` back to a function pointer and does a little more work to manage catch index in `CFGWalker` side. Hopefully fixes #3857.

This adds support `CFGWalker` for the new EH instructions (`try_table` and `throw_ref`). `CFGWalker` is used by many different passes, but in the same vein as WebAssembly#3494, this adds tests for `RedundantSetElimination` pass. `rse-eh.wast` file is created from translated and simplified version of `rse-eh-old.wast`, but many tests were removed because we don't have special `catch` block or `delegate` anymore.

This adds support `CFGWalker` for the new EH instructions (`try_table` and `throw_ref`). `CFGWalker` is used by many different passes, but in the same vein as #3494, this adds tests for `RedundantSetElimination` pass. `rse-eh.wast` file is created from translated and simplified version of `rse-eh-old.wast`, but many tests were removed because we don't have special `catch` block or `delegate` anymore.

This adds support `CFGWalker` for the new EH instructions (`try_table` and `throw_ref`). `CFGWalker` is used by many different passes, but in the same vein as WebAssembly#3494, this adds tests for `RedundantSetElimination` pass. `rse-eh.wast` file is created from translated and simplified version of `rse-eh-old.wast`, but many tests were removed because we don't have special `catch` block or `delegate` anymore.

aheejin requested review from kripken and tlively January 15, 2021 10:19

clang-format

2297dfe

kripken reviewed Jan 15, 2021

View reviewed changes

aheejin and others added 2 commits January 16, 2021 03:57

Update src/cfg/cfg-traversal.h

6c60880

Co-authored-by: Alon Zakai <[email protected]>

Address comments

7ee8a09

aheejin commented Jan 15, 2021

View reviewed changes

kripken reviewed Jan 15, 2021

View reviewed changes

tlively reviewed Jan 15, 2021

View reviewed changes

aheejin added 3 commits January 16, 2021 08:55

Addres comments

c211825

Merge branch 'master' into eh_cfg_traversal

65d4ff9

more fix

1077763

aheejin mentioned this pull request Jan 18, 2021

Update interpreter for new EH spec #3498

Merged

Base automatically changed from master to main January 19, 2021 21:59

aheejin added 2 commits January 20, 2021 05:07

Lamda capture by value + use `std::function

024210d

Add TODO comment

53ff4b5

aheejin mentioned this pull request Jan 20, 2021

Debug info handling for new EH try-catch #3496

Merged

tlively approved these changes Jan 20, 2021

View reviewed changes

kripken approved these changes Jan 20, 2021

View reviewed changes

aheejin merged commit b77d0af into WebAssembly:main Jan 21, 2021

aheejin deleted the eh_cfg_traversal branch January 21, 2021 12:10

aheejin mentioned this pull request May 14, 2021

Much slower compilation compare to v98 #3857

Closed

aheejin mentioned this pull request May 20, 2021

[EH] Change Walker::TaskFunc back to function pointer #3899

Merged

aheejin mentioned this pull request Jan 24, 2024

[EH] Support CFGWalker for new EH spec #6235

Merged

CFG traversal for the new EH spec #3494

CFG traversal for the new EH spec #3494

Uh oh!

Conversation

aheejin commented Jan 15, 2021

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aheejin Jan 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aheejin Jan 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlively left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kripken left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aheejin Jan 16, 2021 •

edited

Loading

aheejin Jan 20, 2021 •

edited

Loading