-
Notifications
You must be signed in to change notification settings - Fork 786
CFG traversal for the new EH spec #3494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This updates CFG traversal to match the new spec. Previously there was only a single `catch` block that caught all exceptions, so all throwing instructions needed to have a link to its innermost catch BB. But now we can have multiple catches per try, requiring all throwing instrutions to have an edge to all of those innermost catch BBs. Furthermore, if there are only `catch`es and not a `catch_all` in a try, throwing instructions can further unwind to outer catches until they find a `catch_all`. `unwindCatchStack` and `unwindExprStack` are necessary to track and make correct links between throwing instructions and their unwind destination BBs. `processCatchStack` is used to remember the catch BBs currently being processed, so that after processing all of them, we can make a link from each of those catch's last block to the continuation block after the try-catch. RSE test cases are updated because they use the CFG traversal. The tests there mainly test that if all possible CFG edge to a `local.set` sets the same value to a local, the `local.set` is redundant and thus can be removed.
src/cfg/cfg-traversal.h
Outdated
std::vector<Expression*> unwindExprStack; | ||
// Contains the stack of the first blocks of catches currently being processed | ||
// in the beginning. Will contain the last blocks of catches after being | ||
// processed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment is a little unclear to me. What does "beginning" mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to explain this in #3494 (comment). So this vector stores catch start BBs in doStartTry
so that each doStartCatch
can retrieve its start BB. After processing each catch doEndCatch
stores each catch's end BB so that doEndTry
can access it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we decide to keep a single vector for this, how about
// A stack for each try, where each entry is a list of blocks, one for
// each catch, used during processing. We start by assigning the
// start blocks to here, and then read those at the appropriate time;
// when we finish a catch we write to here the end block, so that when
// we finish with them all we can connect the ends to the outside. In
// principle two vectors could be used, but their usage does not overlap
// in time, and this is more efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Done.
} | ||
|
||
static void doEndCatch(SubType* self, Expression** currp, Index i) { | ||
self->processCatchStack.back()[i] = self->currBasicBlock; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow what these two methods do. The start assigns the current basic block, and the end does the reverse assignment - is it not redundant? If not, what is it writing over?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When there's N catches, doStartTry
creates N BBs, each of which will serve as the start BB for each catch, and store them in unwindCatchBlock
, so that any throwing instructions within the try body can create a link to them.
In doStartCatches
, we move those N catch start BBs to processCatchBlock
. We need this vector because when we enter each catch in doStartCatch
, we should be able to retrieve the start BB created for that catch. But we can't leave those catch start BBs in unwindCatchBlock
anymore because that stack is for processing unwinding, the process I described above (= any throwing instruction needs a link to every catch start BB in unwindCatchBlock.back()
).
In doEndCatch
, we save the end BB of each catch to processCatchStack
. They are used in doEndTry
that we create a link from each of those end catch BB to the continuation BB after the try-catch. Reusing processCatchStack
is just to save some space, and we can use a separate vector for this; let me know if you think this is confusing and it'd be better to use separate vectors.
I'm not sure if I described this well; it was hard to describe in the comments too. If you have any suggestions to improve the comments (or the code) I'd appreciate that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think I'm starting to get it.
Separate vectors is another option. Not sure it's worth the overhead, as I'm not sure how many catches we expect to see in practice...?
If we keep the same vector as now, I think comments in these two methods could be helpful, like
// Get the block that starts this catch
// We are done with this catch; set the block that ends it
Even though they don't add much information, they increase the chance of it making sense when reading just those two methods, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Co-authored-by: Alon Zakai <[email protected]>
self->pushTask(doEndCatchI, currp); | ||
self->pushTask(SubType::scan, &catchBodies[i]); | ||
auto doStartCatchI = std::bind(doStartCatch, _1, _2, i); | ||
self->pushTask(doStartCatchI, currp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asan says this line causes a memory leak, but I'm not very sure why... Do you have a guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very odd, no idea... code looks 100% valid to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I figured out what's going on. Surprisingly std::function
's constructor does some dynamic allocation in some cases. (It seems this behavior is dropped in C++17, but we are using C++14.) And our SmallVector
does not call destructors of objects on pop_back()
of clear()
when they are stored in fixed
part. (flexible
part is fine because it's using std::vector
.) It was fine until now because what we've stored in SmallVector
, such as Expression
s, are all managed by a separate allocation arena.
Fixing this does not seem to be trivial. We can't call destructors on all values (there are primitive types), so SmallVector
has to be somewhat complicated, and I'm not sure that's worth the effort. And first of all, given that std::function
can cause a dynamic allocation, I'm not sure if we should use it in the part that can be performance critical. The reason I changed
typedef void (*TaskFunc)(SubType*, Expression**);
to
using TaskFunc = std::function<void(SubType*, Expression**)>;
was the result of std::bind
can be passed as a std::function
but not a function pointer. The reason I used std::bind
and not a lambda was a lambda capturing variables cannot be passed as a function argument. (Here we need to capture i
)
Maybe we should just use lambdas and make i
a class variable or something, which seems less cleaner, but maybe we can't help..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow, good find... C++ is annoying sometimes.
Yes, lambdas seem best for now, I can't think of a better option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to lambdas: #3494 (comment)
I was a little confused. I thought capturing lambdas couldn't be passed as a function argument but that turned out to be only the case when we use the function pointer as TaskFunc
. After we changed it to std::function
it works fine. Also, not sure why, but the asan memory leak error does not appear in the CG anymore 🤷🏻 (Was that linked to a condition related to std::bind
? No idea) So I guess we will go with this.
About possible performance issues, I ran all wasm-opt tests with 1. TaskFunc
being a function pointer and 2. TaskFunc
being a std::function
, in release build. The performance difference doesn't seem to be very concerning. WDYT? All numbers are measured three times but the first result was thrown out of concerns that it might be affected by the cache warming up.
TaskFunc
is a function pointer
real 0m56.546s
user 0m46.372s
sys 1m20.068s
real 0m57.272s
user 0m46.733s
sys 1m21.784s
TaskFunc
is astd::function
real 0m58.454s
user 0m47.954s
sys 1m22.177s
real 0m57.924s
user 0m47.401s
sys 1m22.224s
} | ||
|
||
static void doEndCatch(SubType* self, Expression** currp, Index i) { | ||
self->processCatchStack.back()[i] = self->currBasicBlock; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think I'm starting to get it.
Separate vectors is another option. Not sure it's worth the overhead, as I'm not sure how many catches we expect to see in practice...?
If we keep the same vector as now, I think comments in these two methods could be helpful, like
// Get the block that starts this catch
// We are done with this catch; set the block that ends it
Even though they don't add much information, they increase the chance of it making sense when reading just those two methods, I think.
src/cfg/cfg-traversal.h
Outdated
std::vector<Expression*> unwindExprStack; | ||
// Contains the stack of the first blocks of catches currently being processed | ||
// in the beginning. Will contain the last blocks of catches after being | ||
// processed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we decide to keep a single vector for this, how about
// A stack for each try, where each entry is a list of blocks, one for
// each catch, used during processing. We start by assigning the
// start blocks to here, and then read those at the appropriate time;
// when we finish a catch we write to here the end block, so that when
// we finish with them all we can connect the ends to the outside. In
// principle two vectors could be used, but their usage does not overlap
// in time, and this is more efficient.
src/cfg/cfg-traversal.h
Outdated
@@ -205,47 +211,79 @@ struct CFGWalker : public ControlFlowWalker<SubType, VisitorType> { | |||
self->startUnreachableBlock(); | |||
} | |||
|
|||
static void doEndCall(SubType* self, Expression** currp) { | |||
static void doEndThrowingInst(SubType* self, Expression** currp) { | |||
// Every call can possibly throw, but we don't end the current basic block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call
is no longer accurate throughout this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to
Even if the instruction can possibly throw, we don't end the current basic block unless ...
self->pushTask(doEndCatchI, currp); | ||
self->pushTask(SubType::scan, &catchBodies[i]); | ||
auto doStartCatchI = std::bind(doStartCatch, _1, _2, i); | ||
self->pushTask(doStartCatchI, currp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very odd, no idea... code looks 100% valid to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall! I think @kripken's ideas for the comments are good, too.
src/cfg/cfg-traversal.h
Outdated
// thrown can be caught by any of its outer catch block. And if that outer | ||
// try-catch also does not have a catch_all, this continues until we | ||
// encounter try-catch_all. Create a link to all those possible catch | ||
// unwind destinations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to keep track of event types here so that a link is only created to the innermost catch block that handles each event type? AFAIK that would be more precise, but I don't know how precise this CFG needs to be. If it's not necessary now, it might be worth mentioning in a TODO if it would help with any optimizations later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do something like that for throw
s, but not calls, because we don't know what event type will be thrown from calls. And given that the vast majority of throwing instructions will be calls, I'm not sure how much impact that can have, but I'll add a TODO here:
// TODO This can be more precise for `throw`s if we compare event types
// and create links to outer catch BBs only when the exception is not
// caught.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this applies for calls, too. For example:
try
try
call $foo
catch $e1
...
catch $e2
...
end
catch $e1
...
catch $e3
...
end
Here the call to $foo can go to either of the inner catch blocks or the outer catch of $e3
, but it will never arrive directly in the outer catch of $e1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that can be a good optimization opportunity. Do you mind if I mark that as a TODO for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the following TODO comment:
// TODO This can also be more precise if we analyze the structure of nested
// try-catches. For example, in the example below, 'call $foo' doesn't need
// a link to the BB of outer 'catch $e1', because if the exception thrown by
// the call is of event $e1, it would've already been caught by the inner
// 'catch $e1'. Optimize these cases later.
// try
// try
// call $foo
// catch $e1
// ...
// catch $e2
// ...
// end
// catch $e1
// ...
// catch $e3
// ...
// end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
src/cfg/cfg-traversal.h
Outdated
auto& catchBodies = curr->cast<Try>()->catchBodies; | ||
using namespace std::placeholders; | ||
for (Index i = 0; i < catchBodies.size(); i++) { | ||
auto doEndCatchI = std::bind(doEndCatch, _1, _2, i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lambdas would be more readable than std::bind
IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but lambdas capturing variables cannot be passed as a function argument. (Here we need to capture i
) Do you know any way around it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would capture-by-copy using [i](...) { doEndCatch(..., i); }
work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works, thanks. I was a little confused while experimenting with different settings; what didn't allow capturing lambdas to be passed as a function argument is the current TaskFunc
, which is a function pointer. If we change TaskFunc
to std::function
, I mean, from
typedef void (*TaskFunc)(SubType*, Expression**);
to
using TaskFunc = std::function<void(SubType*, Expression**)>;
This seems to be able to take capturing lambdas without problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting std::function
does not affect performance. That's better than I expected. But I guess even if it does do some extra check, it's negligible compared to the cost of the indirect call itself.
`Walker::TaskFunc` has changed from a function pointer to `std::function` in WebAssembly#3494, mainly to make the EH support for `CFGWalker` easier. We didn't notice much performance difference then, but it was recently reported that it creased binaryen.js code size and performance. This changes `Walker::TaskFunc` back to a function pointer and does a little more work to manage catch index in `CFGWalker` side. Hopefully fixes WebAssembly#3857.
`Walker::TaskFunc` has changed from a function pointer to `std::function` in WebAssembly#3494, mainly to make the EH support for `CFGWalker` easier. We didn't notice much performance difference then, but it was recently reported that it creased binaryen.js code size and performance. This changes `Walker::TaskFunc` back to a function pointer and does a little more work to manage catch index in `CFGWalker` side. Hopefully fixes WebAssembly#3857.
`Walker::TaskFunc` has changed from a function pointer to `std::function` in WebAssembly#3494, mainly to make the EH support for `CFGWalker` easier. We didn't notice much performance difference then, but it was recently reported that it creased binaryen.js code size and performance. This changes `Walker::TaskFunc` back to a function pointer and does a little more work to manage catch index in `CFGWalker` side. Hopefully fixes WebAssembly#3857.
`Walker::TaskFunc` has changed from a function pointer to `std::function` in WebAssembly#3494, mainly to make the EH support for `CFGWalker` easier. We didn't notice much performance difference then, but it was recently reported that it creased binaryen.js code size and performance. This changes `Walker::TaskFunc` back to a function pointer and does a little more work to manage catch index in `CFGWalker` side. Hopefully fixes WebAssembly#3857.
`Walker::TaskFunc` has changed from a function pointer to `std::function` in #3494, mainly to make the EH support for `CFGWalker` easier. We didn't notice much performance difference then, but it was recently reported that it creased binaryen.js code size and performance. This changes `Walker::TaskFunc` back to a function pointer and does a little more work to manage catch index in `CFGWalker` side. Hopefully fixes #3857.
This adds support `CFGWalker` for the new EH instructions (`try_table` and `throw_ref`). `CFGWalker` is used by many different passes, but in the same vein as WebAssembly#3494, this adds tests for `RedundantSetElimination` pass. `rse-eh.wast` file is created from translated and simplified version of `rse-eh-old.wast`, but many tests were removed because we don't have special `catch` block or `delegate` anymore.
This adds support `CFGWalker` for the new EH instructions (`try_table` and `throw_ref`). `CFGWalker` is used by many different passes, but in the same vein as #3494, this adds tests for `RedundantSetElimination` pass. `rse-eh.wast` file is created from translated and simplified version of `rse-eh-old.wast`, but many tests were removed because we don't have special `catch` block or `delegate` anymore.
This adds support `CFGWalker` for the new EH instructions (`try_table` and `throw_ref`). `CFGWalker` is used by many different passes, but in the same vein as WebAssembly#3494, this adds tests for `RedundantSetElimination` pass. `rse-eh.wast` file is created from translated and simplified version of `rse-eh-old.wast`, but many tests were removed because we don't have special `catch` block or `delegate` anymore.
This updates CFG traversal to match the new spec. Previously there was
only a single
catch
block that caught all exceptions, so all throwinginstructions needed to have a link to its innermost catch BB. But now we
can have multiple catches per try, requiring all throwing instrutions to
have an edge to all of those innermost catch BBs. Furthermore, if there
are only
catch
es and not acatch_all
in a try, throwing instructionscan further unwind to outer catches until they find a
catch_all
.unwindCatchStack
andunwindExprStack
are necessary to track and makecorrect links between throwing instructions and their unwind destination
BBs.
processCatchStack
is used to remember the catch BBs currently beingprocessed, so that after processing all of them, we can make a link from
each of those catch's last block to the continuation block after the
try-catch.
RSE test cases are updated because they use the CFG traversal. The tests
there mainly test that if all possible CFG edge to a
local.set
setsthe same value to a local, the
local.set
is redundant and thus can beremoved.