[mypyc] Speed up in operations for list/tuple #9004

jdahlin · 2020-06-15T20:24:48Z

When right hand side of a in/not in operation is a literal
list/tuple, simplify it into simpler direct equality comparision
expressions and use binary and/or to join them.

Part of mypyc/mypyc#726, but this only speeds up list/tuple.

This is my first contribution to mypy/mypyc, please let me know if there's anything I can do to improve the pull request. I ended up create new tree nodes (OpExpr/ComparisionExpr) inside IRBuilder which is probably not generating as efficient as it can be. If that needs to change let me know and please provide me with some pointers on how to build a more efficient ir. Happy to add tests for the specific IR generated as well if needed.

I didn't do any macro benchmarks on mypy itself, would be happy to know what's normally benchmarked and instructions on how to do so.

# before (x = 10)
2000000 loops, best of 5: 113 nsec per loop  # x in [1]
2000000 loops, best of 5: 116 nsec per loop  # x in [1, 2]
2000000 loops, best of 5: 128 nsec per loop  # x in [1, 2, 3]
2000000 loops, best of 5: 136 nsec per loop  # x in [1, 2, 3, 4]
2000000 loops, best of 5: 145 nsec per loop  # x in [1, 2, 3, 4, 5]

5000000 loops, best of 5: 88.7 nsec per loop # x in (1)
5000000 loops, best of 5: 97.6 nsec per loop # x in (1, 2)
2000000 loops, best of 5: 108 nsec per loop  # x in (1, 2, 3)
2000000 loops, best of 5: 118 nsec per loop  # x in (1, 2, 3, 4)
2000000 loops, best of 5: 129 nsec per loop  # x in (1, 2, 3, 4, 5)

# after (x = 10)
5000000 loops, best of 5: 54.8 nsec per loop  # x in [1]
5000000 loops, best of 5: 55.9 nsec per loop  # x in [1, 2]
5000000 loops, best of 5: 56 nsec per loop  # x in [1, 2, 3]
5000000 loops, best of 5: 55.5 nsec per loop  # x in [1, 2, 3, 4]
5000000 loops, best of 5: 55.1 nsec per loop  # x in [1, 2, 3, 4, 5]

5000000 loops, best of 5: 55 nsec per loop  # x in (1)   
5000000 loops, best of 5: 55.1 nsec per loop    # x in (1, 2)
5000000 loops, best of 5: 54.9 nsec per loop  # x in (1, 2, 3) 
5000000 loops, best of 5: 55.8 nsec per loop  # x in (1, 2, 3, 4) 
5000000 loops, best of 5: 55.6 nsec per loop  # x in (1, 2, 3, 4, 5) 

For reference, using CPython 3.8.2:

5000000 loops, best of 5: 57 nsec per loop  # x in (1)   
5000000 loops, best of 5: 61.6 nsec per loop    # x in (1, 2)
5000000 loops, best of 5: 72 nsec per loop  # x in (1, 2, 3) 
5000000 loops, best of 5: 76.8 nsec per loop  # x in (1, 2, 3, 4) 
5000000 loops, best of 5: 84.6 nsec per loop  # x in (1, 2, 3, 4, 5)

TH3CHARLie

I like this specialization. cc @JukkaL @msullivan

mypyc/irbuild/expression.py

msullivan

This looks great, thanks!. I've just got one thing that you can address if you want or just put a TODO in if you don't

msullivan · 2020-06-17T02:23:10Z

mypyc/irbuild/expression.py

+        elif n_items < 16:
+            bin_op = 'or' if e.operators[0] == 'in' else 'and'
+            lhs = e.operands[0]
+            exprs = (ComparisonExpr([cmp_op], [lhs, item]) for item in items)


I /think/ that since these expressions don't appear in the type table, they will get coerced to object, leading to some kind of pointless boxing. I guess that isn't actually that expensive for bools, but it's worth cleaning up. Fine to just put a TODO in for now if you don't feel like cleaning it up now.

Thanks for the review @msullivan, I'd be happy to fix this but I would probably need some more specific pointers of what needs to be modified for that to work.

@msullivan I ended up, perhaps somewhat hacky to shortcut all OpExpr/ComparisonExpr without types as bool_primitive. That removed the excessive box/unboxing, with these changes it's significantly faster, I measured somewhere between 46%-78% for micro benchmarks. Seems like it triggers some happy path in the C compiler, as the length of the tuple/list is no longer relevant for performance. (tested sequentes up to 16 items of int)

Happy to hear it is a lot faster! I don't like this hack, though.

I think there are two ways forward:

Directly generate the code without creating an AST for it first. One way to do this would involve using shortcircuit_helper, though also it could probably just be done directly.

Put all of the generated expressions into the type table. This could probably be done ergonomically by adding a helper method that takes an expression and a type, adds it to the table, and returns the expression.

Historically though we haven't really done AST generation as part of compilation (I think mostly because it would require populating the type table), but it seems fine to allow if it is done tastefully.
(@JukkaL, do you agree?)

No problem, I'll populate the type table with these types then, shortcut_helper scared me a bit, it seemed more straight forward to me to generate AST nodes for this issue.

When right hand side of a in/not in operation is a literal list/tuple, simplify it into simpler direct equality comparision expressions and use binary and/or to join them. Yields speedup of up to 46% in micro benchmarks.

This makes it easier to add type annotations for subtypes.

msullivan · 2020-06-26T01:46:14Z

Looks good. Check out the lint failure though

mypyc/irbuild/expression.py

Ensure we only operate on ComparisonExpr with at most one operator. Co-authored-by: Tomer Chachamu <[email protected]>

JukkaL · 2020-09-26T16:46:38Z

@msullivan @jdahlin I wonder what needs to be done (beyond fixing merge conflicts) to get this merged? This would be really nice to have.

jdahlin · 2020-09-26T17:37:02Z

I believe nothing else is needed apart from the merge conflicts. I will see if I can fix these in the next couple of days.

…

On Sat, 26 Sep 2020 at 13:46, Jukka Lehtosalo ***@***.***> wrote: @msullivan <https://github.com/msullivan> @jdahlin <https://github.com/jdahlin> I wonder what needs to be done (beyond fixing merge conflicts) to get this merged? This would be really nice to have. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9004 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAASWQRKOMXNA5KWKA6OWHTSHYLHVANCNFSM4N6UAE5Q> .

Add back operator tests, use builder.false/true

…n-operation

jdahlin · 2020-09-28T11:15:26Z

@JukkaL I've finished merging it with latest master, all tests pass now.

TH3CHARLie

LGTM! Thanks for speeding up mypyc!

mypyc/test-data/irbuild-tuple.test

Co-authored-by: Xuanda Yang <[email protected]>

JukkaL · 2020-09-28T16:14:04Z

Thanks! I expect to see some nice improvements in benchmark results after the next nightly run (https://github.com/mypyc/mypyc-benchmarks).

jdahlin · 2020-09-28T16:35:13Z

Thanks for merging it @JukkaL! Is there a way to see the results for the nightly build somewhere? I'm also curious about the effect of this on larger pieces of code.

TH3CHARLie · 2020-09-28T16:50:12Z

Thanks for merging it @JukkaL! Is there a way to see the results for the nightly build somewhere? I'm also curious about the effect of this on larger pieces of code.

https://github.com/mypyc/mypyc-benchmark-results/blob/master/reports/summary-microbenchmarks.md would be the place to find out the related results. The performance boost would be less significant on larger code though.

JukkaL · 2020-09-28T17:19:31Z

For many optimizations microbenchmarks work well to estimate the level of improvement. Even for a very good optimization the impact to most major benchmarks can be below the measurement noise floor, or we might have no major benchmarks that happen to use the targeted feature in sufficient volume to be affected.

This should get gradually better as we add more benchmarks. If we have an optimization that isn't reflected in any existing benchmarks, it may be a good idea to add another (micro)benchmark to catch regressions in the future.

TH3CHARLie · 2020-09-29T02:36:38Z

From the results of microbenchmark in_list and in_tuple, the performance boost of this PR is huge.

TH3CHARLie reviewed Jun 16, 2020

View reviewed changes

mypyc/irbuild/expression.py Outdated Show resolved Hide resolved

msullivan reviewed Jun 17, 2020

View reviewed changes

jdahlin force-pushed the mypyc-in-operation branch from 55ad10d to 870e1d5 Compare June 19, 2020 20:38

Johan Dahlin added 8 commits June 25, 2020 20:08

[mypyc] Speed up in operations for list/tuple

67a7742

When right hand side of a in/not in operation is a literal list/tuple, simplify it into simpler direct equality comparision expressions and use binary and/or to join them. Yields speedup of up to 46% in micro benchmarks.

Inline reduce

4c5dd78

This makes it easier to add type annotations for subtypes.

Use type comment for Python 3.5 compatibility

186d50d

Shortcut typeless Comparision/OpExpr as bool

1572ee0

Order of comparisions by most likely

c4ee283

Add an IR test for in tuple/list literal

70c8cd5

Regenerate test against HEAD

21b419e

Specify types directly when creating AST nodes.

1623477

jdahlin force-pushed the mypyc-in-operation branch from 870e1d5 to 1623477 Compare June 25, 2020 23:08

Remove unused bool_rprimitive to pass pyflakes

6ed0964

r3m0t suggested changes Jul 9, 2020

View reviewed changes

mypyc/irbuild/expression.py Outdated Show resolved Hide resolved

Update mypyc/irbuild/expression.py

abb4555

Ensure we only operate on ComparisonExpr with at most one operator. Co-authored-by: Tomer Chachamu <[email protected]>

Johan Dahlin added 6 commits September 27, 2020 22:29

Merge remote-tracking branch 'upstream/master' into mypyc-in-operation

d32d717

Merge remote-tracking branch 'upstream/master' into mypyc-in-operation

b917da0

Add back operator tests, use builder.false/true

Merge remote-tracking branch 'origin/mypyc-in-operation' into mypyc-i…

16245b2

…n-operation

Regenerate IR

7c599c0

irbuild-tuple: int64 -> native_int

b92d467

Wrap line to make flake8 happy

62c6ddd

TH3CHARLie approved these changes Sep 28, 2020

View reviewed changes

mypyc/test-data/irbuild-tuple.test Outdated Show resolved Hide resolved

Update mypyc/test-data/irbuild-tuple.test

c9675cf

Co-authored-by: Xuanda Yang <[email protected]>

JukkaL merged commit 8bf770d into python:master Sep 28, 2020

jdahlin deleted the mypyc-in-operation branch September 29, 2020 14:20

XiaoXuan42 mentioned this pull request Apr 10, 2021

Faster "in" operation against tuple/list/set expression mypyc/mypyc#726

Closed

Uh oh!

[mypyc] Speed up in operations for list/tuple #9004

[mypyc] Speed up in operations for list/tuple #9004

Uh oh!

Conversation

jdahlin commented Jun 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TH3CHARLie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

msullivan left a comment

Choose a reason for hiding this comment

Uh oh!

msullivan Jun 17, 2020

Choose a reason for hiding this comment

Uh oh!

jdahlin Jun 17, 2020

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

jdahlin Jun 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

msullivan Jun 25, 2020

Choose a reason for hiding this comment

Uh oh!

jdahlin Jun 25, 2020

Choose a reason for hiding this comment

Uh oh!

msullivan commented Jun 26, 2020

Uh oh!

Uh oh!

JukkaL commented Sep 26, 2020

Uh oh!

jdahlin commented Sep 26, 2020 via email

Uh oh!

jdahlin commented Sep 28, 2020

Uh oh!

TH3CHARLie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JukkaL commented Sep 28, 2020

Uh oh!

jdahlin commented Sep 28, 2020

Uh oh!

TH3CHARLie commented Sep 28, 2020

Uh oh!

JukkaL commented Sep 28, 2020

Uh oh!

TH3CHARLie commented Sep 29, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jdahlin commented Jun 15, 2020 •

edited

Loading

jdahlin Jun 19, 2020 •

edited

Loading