-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
[mypyc] Speed up in operations for list/tuple #9004
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this specialization. cc @JukkaL @msullivan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thanks!. I've just got one thing that you can address if you want or just put a TODO in if you don't
mypyc/irbuild/expression.py
Outdated
elif n_items < 16: | ||
bin_op = 'or' if e.operators[0] == 'in' else 'and' | ||
lhs = e.operands[0] | ||
exprs = (ComparisonExpr([cmp_op], [lhs, item]) for item in items) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I /think/ that since these expressions don't appear in the type table, they will get coerced to object, leading to some kind of pointless boxing. I guess that isn't actually that expensive for bools, but it's worth cleaning up. Fine to just put a TODO in for now if you don't feel like cleaning it up now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @msullivan, I'd be happy to fix this but I would probably need some more specific pointers of what needs to be modified for that to work.
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msullivan I ended up, perhaps somewhat hacky to shortcut all OpExpr/ComparisonExpr without types as bool_primitive. That removed the excessive box/unboxing, with these changes it's significantly faster, I measured somewhere between 46%-78% for micro benchmarks. Seems like it triggers some happy path in the C compiler, as the length of the tuple/list is no longer relevant for performance. (tested sequentes up to 16 items of int)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to hear it is a lot faster! I don't like this hack, though.
I think there are two ways forward:
- Directly generate the code without creating an AST for it first. One way to do this would involve using
shortcircuit_helper
, though also it could probably just be done directly. - Put all of the generated expressions into the type table. This could probably be done ergonomically by adding a helper method that takes an expression and a type, adds it to the table, and returns the expression.
Historically though we haven't really done AST generation as part of compilation (I think mostly because it would require populating the type table), but it seems fine to allow if it is done tastefully.
(@JukkaL, do you agree?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, I'll populate the type table with these types then, shortcut_helper
scared me a bit, it seemed more straight forward to me to generate AST nodes for this issue.
55ad10d
to
870e1d5
Compare
When right hand side of a in/not in operation is a literal list/tuple, simplify it into simpler direct equality comparision expressions and use binary and/or to join them. Yields speedup of up to 46% in micro benchmarks.
This makes it easier to add type annotations for subtypes.
870e1d5
to
1623477
Compare
Looks good. Check out the lint failure though |
Ensure we only operate on ComparisonExpr with at most one operator. Co-authored-by: Tomer Chachamu <[email protected]>
@msullivan @jdahlin I wonder what needs to be done (beyond fixing merge conflicts) to get this merged? This would be really nice to have. |
I believe nothing else is needed apart from the merge conflicts. I will see
if I can fix these in the next couple of days.
…On Sat, 26 Sep 2020 at 13:46, Jukka Lehtosalo ***@***.***> wrote:
@msullivan <https://github.com/msullivan> @jdahlin
<https://github.com/jdahlin> I wonder what needs to be done (beyond
fixing merge conflicts) to get this merged? This would be really nice to
have.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9004 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAASWQRKOMXNA5KWKA6OWHTSHYLHVANCNFSM4N6UAE5Q>
.
|
Add back operator tests, use builder.false/true
@JukkaL I've finished merging it with latest master, all tests pass now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for speeding up mypyc!
Co-authored-by: Xuanda Yang <[email protected]>
Thanks! I expect to see some nice improvements in benchmark results after the next nightly run (https://github.com/mypyc/mypyc-benchmarks). |
Thanks for merging it @JukkaL! Is there a way to see the results for the nightly build somewhere? I'm also curious about the effect of this on larger pieces of code. |
https://github.com/mypyc/mypyc-benchmark-results/blob/master/reports/summary-microbenchmarks.md would be the place to find out the related results. The performance boost would be less significant on larger code though. |
For many optimizations microbenchmarks work well to estimate the level of improvement. Even for a very good optimization the impact to most major benchmarks can be below the measurement noise floor, or we might have no major benchmarks that happen to use the targeted feature in sufficient volume to be affected. This should get gradually better as we add more benchmarks. If we have an optimization that isn't reflected in any existing benchmarks, it may be a good idea to add another (micro)benchmark to catch regressions in the future. |
From the results of microbenchmark |
When right hand side of a in/not in operation is a literal
list/tuple, simplify it into simpler direct equality comparision
expressions and use binary and/or to join them.
Part of mypyc/mypyc#726, but this only speeds up list/tuple.
This is my first contribution to mypy/mypyc, please let me know if there's anything I can do to improve the pull request. I ended up create new tree nodes (OpExpr/ComparisionExpr) inside IRBuilder which is probably not generating as efficient as it can be. If that needs to change let me know and please provide me with some pointers on how to build a more efficient ir. Happy to add tests for the specific IR generated as well if needed.
I didn't do any macro benchmarks on mypy itself, would be happy to know what's normally benchmarked and instructions on how to do so.