Skip to content

CFG traversal for the new EH spec #3494

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jan 21, 2021
Merged

Conversation

aheejin
Copy link
Member

@aheejin aheejin commented Jan 15, 2021

This updates CFG traversal to match the new spec. Previously there was
only a single catch block that caught all exceptions, so all throwing
instructions needed to have a link to its innermost catch BB. But now we
can have multiple catches per try, requiring all throwing instrutions to
have an edge to all of those innermost catch BBs. Furthermore, if there
are only catches and not a catch_all in a try, throwing instructions
can further unwind to outer catches until they find a catch_all.
unwindCatchStack and unwindExprStack are necessary to track and make
correct links between throwing instructions and their unwind destination
BBs.

processCatchStack is used to remember the catch BBs currently being
processed, so that after processing all of them, we can make a link from
each of those catch's last block to the continuation block after the
try-catch.

RSE test cases are updated because they use the CFG traversal. The tests
there mainly test that if all possible CFG edge to a local.set sets
the same value to a local, the local.set is redundant and thus can be
removed.

This updates CFG traversal to match the new spec. Previously there was
only a single `catch` block that caught all exceptions, so all throwing
instructions needed to have a link to its innermost catch BB. But now we
can have multiple catches per try, requiring all throwing instrutions to
have an edge to all of those innermost catch BBs. Furthermore, if there
are only `catch`es and not a `catch_all` in a try, throwing instructions
can further unwind to outer catches until they find a `catch_all`.
`unwindCatchStack` and `unwindExprStack` are necessary to track and make
correct links between throwing instructions and their unwind destination
BBs.

`processCatchStack` is used to remember the catch BBs currently being
processed, so that after processing all of them, we can make a link from
each of those catch's last block to the continuation block after the
try-catch.

RSE test cases are updated because they use the CFG traversal. The tests
there mainly test that if all possible CFG edge to a `local.set` sets
the same value to a local, the `local.set` is redundant and thus can be
removed.
@aheejin aheejin requested review from kripken and tlively January 15, 2021 10:19
std::vector<Expression*> unwindExprStack;
// Contains the stack of the first blocks of catches currently being processed
// in the beginning. Will contain the last blocks of catches after being
// processed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is a little unclear to me. What does "beginning" mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to explain this in #3494 (comment). So this vector stores catch start BBs in doStartTry so that each doStartCatch can retrieve its start BB. After processing each catch doEndCatch stores each catch's end BB so that doEndTry can access it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to keep a single vector for this, how about

// A stack for each try, where each entry is a list of blocks, one for
// each catch, used during processing. We start by assigning the
// start blocks to here, and then read those at the appropriate time;
// when we finish a catch we write to here the end block, so that when
// we finish with them all we can connect the ends to the outside. In
// principle two vectors could be used, but their usage does not overlap
// in time, and this is more efficient.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Done.

}

static void doEndCatch(SubType* self, Expression** currp, Index i) {
self->processCatchStack.back()[i] = self->currBasicBlock;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow what these two methods do. The start assigns the current basic block, and the end does the reverse assignment - is it not redundant? If not, what is it writing over?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When there's N catches, doStartTry creates N BBs, each of which will serve as the start BB for each catch, and store them in unwindCatchBlock, so that any throwing instructions within the try body can create a link to them.

In doStartCatches, we move those N catch start BBs to processCatchBlock. We need this vector because when we enter each catch in doStartCatch, we should be able to retrieve the start BB created for that catch. But we can't leave those catch start BBs in unwindCatchBlock anymore because that stack is for processing unwinding, the process I described above (= any throwing instruction needs a link to every catch start BB in unwindCatchBlock.back()).

In doEndCatch, we save the end BB of each catch to processCatchStack. They are used in doEndTry that we create a link from each of those end catch BB to the continuation BB after the try-catch. Reusing processCatchStack is just to save some space, and we can use a separate vector for this; let me know if you think this is confusing and it'd be better to use separate vectors.

I'm not sure if I described this well; it was hard to describe in the comments too. If you have any suggestions to improve the comments (or the code) I'd appreciate that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think I'm starting to get it.

Separate vectors is another option. Not sure it's worth the overhead, as I'm not sure how many catches we expect to see in practice...?

If we keep the same vector as now, I think comments in these two methods could be helpful, like

// Get the block that starts this catch

// We are done with this catch; set the block that ends it

Even though they don't add much information, they increase the chance of it making sense when reading just those two methods, I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

self->pushTask(doEndCatchI, currp);
self->pushTask(SubType::scan, &catchBodies[i]);
auto doStartCatchI = std::bind(doStartCatch, _1, _2, i);
self->pushTask(doStartCatchI, currp);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asan says this line causes a memory leak, but I'm not very sure why... Do you have a guess?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very odd, no idea... code looks 100% valid to me.

Copy link
Member Author

@aheejin aheejin Jan 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I figured out what's going on. Surprisingly std::function's constructor does some dynamic allocation in some cases. (It seems this behavior is dropped in C++17, but we are using C++14.) And our SmallVector does not call destructors of objects on pop_back() of clear() when they are stored in fixed part. (flexible part is fine because it's using std::vector.) It was fine until now because what we've stored in SmallVector, such as Expressions, are all managed by a separate allocation arena.

Fixing this does not seem to be trivial. We can't call destructors on all values (there are primitive types), so SmallVector has to be somewhat complicated, and I'm not sure that's worth the effort. And first of all, given that std::function can cause a dynamic allocation, I'm not sure if we should use it in the part that can be performance critical. The reason I changed

typedef void (*TaskFunc)(SubType*, Expression**);

to

using TaskFunc = std::function<void(SubType*, Expression**)>;

was the result of std::bind can be passed as a std::function but not a function pointer. The reason I used std::bind and not a lambda was a lambda capturing variables cannot be passed as a function argument. (Here we need to capture i)

Maybe we should just use lambdas and make i a class variable or something, which seems less cleaner, but maybe we can't help..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, good find... C++ is annoying sometimes.

Yes, lambdas seem best for now, I can't think of a better option.

Copy link
Member Author

@aheejin aheejin Jan 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to lambdas: #3494 (comment)
I was a little confused. I thought capturing lambdas couldn't be passed as a function argument but that turned out to be only the case when we use the function pointer as TaskFunc. After we changed it to std::function it works fine. Also, not sure why, but the asan memory leak error does not appear in the CG anymore 🤷🏻 (Was that linked to a condition related to std::bind? No idea) So I guess we will go with this.

About possible performance issues, I ran all wasm-opt tests with 1. TaskFunc being a function pointer and 2. TaskFunc being a std::function, in release build. The performance difference doesn't seem to be very concerning. WDYT? All numbers are measured three times but the first result was thrown out of concerns that it might be affected by the cache warming up.

  • TaskFunc is a function pointer
real    0m56.546s
user    0m46.372s
sys     1m20.068s

real    0m57.272s
user    0m46.733s
sys     1m21.784s
  • TaskFunc is a std::function
real    0m58.454s
user    0m47.954s
sys     1m22.177s

real    0m57.924s
user    0m47.401s
sys     1m22.224s

}

static void doEndCatch(SubType* self, Expression** currp, Index i) {
self->processCatchStack.back()[i] = self->currBasicBlock;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think I'm starting to get it.

Separate vectors is another option. Not sure it's worth the overhead, as I'm not sure how many catches we expect to see in practice...?

If we keep the same vector as now, I think comments in these two methods could be helpful, like

// Get the block that starts this catch

// We are done with this catch; set the block that ends it

Even though they don't add much information, they increase the chance of it making sense when reading just those two methods, I think.

std::vector<Expression*> unwindExprStack;
// Contains the stack of the first blocks of catches currently being processed
// in the beginning. Will contain the last blocks of catches after being
// processed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to keep a single vector for this, how about

// A stack for each try, where each entry is a list of blocks, one for
// each catch, used during processing. We start by assigning the
// start blocks to here, and then read those at the appropriate time;
// when we finish a catch we write to here the end block, so that when
// we finish with them all we can connect the ends to the outside. In
// principle two vectors could be used, but their usage does not overlap
// in time, and this is more efficient.

@@ -205,47 +211,79 @@ struct CFGWalker : public ControlFlowWalker<SubType, VisitorType> {
self->startUnreachableBlock();
}

static void doEndCall(SubType* self, Expression** currp) {
static void doEndThrowingInst(SubType* self, Expression** currp) {
// Every call can possibly throw, but we don't end the current basic block
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call is no longer accurate throughout this comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to

Even if the instruction can possibly throw, we don't end the current basic block unless ...

self->pushTask(doEndCatchI, currp);
self->pushTask(SubType::scan, &catchBodies[i]);
auto doStartCatchI = std::bind(doStartCatch, _1, _2, i);
self->pushTask(doStartCatchI, currp);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very odd, no idea... code looks 100% valid to me.

Copy link
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall! I think @kripken's ideas for the comments are good, too.

// thrown can be caught by any of its outer catch block. And if that outer
// try-catch also does not have a catch_all, this continues until we
// encounter try-catch_all. Create a link to all those possible catch
// unwind destinations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to keep track of event types here so that a link is only created to the innermost catch block that handles each event type? AFAIK that would be more precise, but I don't know how precise this CFG needs to be. If it's not necessary now, it might be worth mentioning in a TODO if it would help with any optimizations later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do something like that for throws, but not calls, because we don't know what event type will be thrown from calls. And given that the vast majority of throwing instructions will be calls, I'm not sure how much impact that can have, but I'll add a TODO here:

// TODO This can be more precise for `throw`s if we compare event types    
// and create links to outer catch BBs only when the exception is not      
// caught.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this applies for calls, too. For example:

try
  try
    call $foo
  catch $e1
    ...
  catch $e2
    ...
  end
catch $e1
  ...
catch $e3
  ...
end

Here the call to $foo can go to either of the inner catch blocks or the outer catch of $e3, but it will never arrive directly in the outer catch of $e1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that can be a good optimization opportunity. Do you mind if I mark that as a TODO for now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the following TODO comment:

    // TODO This can also be more precise if we analyze the structure of nested
    // try-catches. For example, in the example below, 'call $foo' doesn't need
    // a link to the BB of outer 'catch $e1', because if the exception thrown by
    // the call is of event $e1, it would've already been caught by the inner
    // 'catch $e1'. Optimize these cases later.
    // try
    //   try
    //     call $foo
    //   catch $e1
    //     ...
    //   catch $e2
    //     ...
    //   end
    // catch $e1
    //   ...
    // catch $e3
    //   ...
    // end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

auto& catchBodies = curr->cast<Try>()->catchBodies;
using namespace std::placeholders;
for (Index i = 0; i < catchBodies.size(); i++) {
auto doEndCatchI = std::bind(doEndCatch, _1, _2, i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lambdas would be more readable than std::bind IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but lambdas capturing variables cannot be passed as a function argument. (Here we need to capture i) Do you know any way around it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would capture-by-copy using [i](...) { doEndCatch(..., i); } work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works, thanks. I was a little confused while experimenting with different settings; what didn't allow capturing lambdas to be passed as a function argument is the current TaskFunc, which is a function pointer. If we change TaskFunc to std::function, I mean, from

typedef void (*TaskFunc)(SubType*, Expression**);

to

using TaskFunc = std::function<void(SubType*, Expression**)>; 

This seems to be able to take capturing lambdas without problem.

Base automatically changed from master to main January 19, 2021 21:59
Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting std::function does not affect performance. That's better than I expected. But I guess even if it does do some extra check, it's negligible compared to the cost of the indirect call itself.

@aheejin aheejin merged commit b77d0af into WebAssembly:main Jan 21, 2021
@aheejin aheejin deleted the eh_cfg_traversal branch January 21, 2021 12:10
aheejin added a commit to aheejin/binaryen that referenced this pull request May 20, 2021
`Walker::TaskFunc` has changed from a function pointer to
`std::function` in WebAssembly#3494, mainly to make the EH support for `CFGWalker`
easier. We didn't notice much performance difference then, but it was
recently reported that it creased binaryen.js code size and performance.
This changes `Walker::TaskFunc` back to a function pointer and does a
little more work to manage catch index in `CFGWalker` side.

Hopefully fixes WebAssembly#3857.
aheejin added a commit to aheejin/binaryen that referenced this pull request May 20, 2021
`Walker::TaskFunc` has changed from a function pointer to
`std::function` in WebAssembly#3494, mainly to make the EH support for `CFGWalker`
easier. We didn't notice much performance difference then, but it was
recently reported that it creased binaryen.js code size and performance.
This changes `Walker::TaskFunc` back to a function pointer and does a
little more work to manage catch index in `CFGWalker` side.

Hopefully fixes WebAssembly#3857.
aheejin added a commit to aheejin/binaryen that referenced this pull request May 20, 2021
`Walker::TaskFunc` has changed from a function pointer to
`std::function` in WebAssembly#3494, mainly to make the EH support for `CFGWalker`
easier. We didn't notice much performance difference then, but it was
recently reported that it creased binaryen.js code size and performance.
This changes `Walker::TaskFunc` back to a function pointer and does a
little more work to manage catch index in `CFGWalker` side.

Hopefully fixes WebAssembly#3857.
aheejin added a commit to aheejin/binaryen that referenced this pull request May 20, 2021
`Walker::TaskFunc` has changed from a function pointer to
`std::function` in WebAssembly#3494, mainly to make the EH support for `CFGWalker`
easier. We didn't notice much performance difference then, but it was
recently reported that it creased binaryen.js code size and performance.
This changes `Walker::TaskFunc` back to a function pointer and does a
little more work to manage catch index in `CFGWalker` side.

Hopefully fixes WebAssembly#3857.
aheejin added a commit that referenced this pull request May 21, 2021
`Walker::TaskFunc` has changed from a function pointer to
`std::function` in #3494, mainly to make the EH support for `CFGWalker`
easier. We didn't notice much performance difference then, but it was
recently reported that it creased binaryen.js code size and performance.
This changes `Walker::TaskFunc` back to a function pointer and does a
little more work to manage catch index in `CFGWalker` side.

Hopefully fixes #3857.
aheejin added a commit to aheejin/binaryen that referenced this pull request Jan 24, 2024
This adds support `CFGWalker` for the new EH instructions (`try_table`
and `throw_ref`). `CFGWalker` is used by many different passes, but in
the same vein as WebAssembly#3494, this adds tests for `RedundantSetElimination`
pass. `rse-eh.wast` file is created from translated and simplified
version of `rse-eh-old.wast`, but many tests were removed because we
don't have special `catch` block or `delegate` anymore.
aheejin added a commit that referenced this pull request Jan 26, 2024
This adds support `CFGWalker` for the new EH instructions (`try_table`
and `throw_ref`). `CFGWalker` is used by many different passes, but in
the same vein as #3494, this adds tests for `RedundantSetElimination`
pass. `rse-eh.wast` file is created from translated and simplified
version of `rse-eh-old.wast`, but many tests were removed because we
don't have special `catch` block or `delegate` anymore.
radekdoulik pushed a commit to dotnet/binaryen that referenced this pull request Jul 12, 2024
This adds support `CFGWalker` for the new EH instructions (`try_table`
and `throw_ref`). `CFGWalker` is used by many different passes, but in
the same vein as WebAssembly#3494, this adds tests for `RedundantSetElimination`
pass. `rse-eh.wast` file is created from translated and simplified
version of `rse-eh-old.wast`, but many tests were removed because we
don't have special `catch` block or `delegate` anymore.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants