bpo-46702: Specialize `UNPACK_SEQUENCE` #31240

brandtbucher · 2022-02-10T00:41:15Z

99% hit rate, with 2/3rds of the hits going to UNPACK_SEQUENCE_TWO_TUPLE.

1% improvement overall:

Slower (11):
- crypto_pyaes: 84.3 ms +- 0.5 ms -> 86.5 ms +- 0.9 ms: 1.03x slower
- nbody: 95.8 ms +- 2.7 ms -> 97.9 ms +- 3.3 ms: 1.02x slower
- json_dumps: 12.1 ms +- 0.1 ms -> 12.3 ms +- 0.1 ms: 1.02x slower
- richards: 50.9 ms +- 1.6 ms -> 51.8 ms +- 1.3 ms: 1.02x slower
- pycparser: 1.15 sec +- 0.02 sec -> 1.17 sec +- 0.02 sec: 1.02x slower
- telco: 6.46 ms +- 0.13 ms -> 6.57 ms +- 0.10 ms: 1.02x slower
- pyflate: 448 ms +- 3 ms -> 455 ms +- 3 ms: 1.01x slower
- pickle: 9.99 us +- 0.11 us -> 10.1 us +- 0.4 us: 1.01x slower
- logging_simple: 5.46 us +- 0.08 us -> 5.51 us +- 0.08 us: 1.01x slower
- django_template: 34.9 ms +- 0.3 ms -> 35.1 ms +- 0.5 ms: 1.01x slower
- thrift: 807 us +- 11 us -> 812 us +- 14 us: 1.01x slower

Faster (37):
- unpack_sequence: 51.4 ns +- 0.5 ns -> 47.6 ns +- 0.8 ns: 1.08x faster
- pidigits: 202 ms +- 0 ms -> 188 ms +- 0 ms: 1.07x faster
- pickle_list: 4.72 us +- 0.06 us -> 4.44 us +- 0.04 us: 1.06x faster
- logging_silent: 114 ns +- 0 ns -> 108 ns +- 1 ns: 1.06x faster
- chaos: 75.2 ms +- 0.9 ms -> 72.4 ms +- 0.8 ms: 1.04x faster
- hexiom: 6.86 ms +- 0.03 ms -> 6.68 ms +- 0.06 ms: 1.03x faster
- scimark_sparse_mat_mult: 5.07 ms +- 0.12 ms -> 4.94 ms +- 0.18 ms: 1.03x faster
- chameleon: 6.95 ms +- 0.12 ms -> 6.80 ms +- 0.08 ms: 1.02x faster
- scimark_fft: 346 ms +- 3 ms -> 339 ms +- 3 ms: 1.02x faster
- raytrace: 322 ms +- 3 ms -> 316 ms +- 4 ms: 1.02x faster
- unpickle_list: 5.09 us +- 0.08 us -> 4.99 us +- 0.05 us: 1.02x faster
- mako: 10.6 ms +- 0.1 ms -> 10.4 ms +- 0.1 ms: 1.02x faster
- pickle_dict: 28.9 us +- 0.1 us -> 28.3 us +- 0.1 us: 1.02x faster
- float: 79.7 ms +- 0.9 ms -> 78.5 ms +- 1.0 ms: 1.02x faster
- scimark_sor: 127 ms +- 1 ms -> 126 ms +- 1 ms: 1.01x faster
- go: 147 ms +- 2 ms -> 145 ms +- 1 ms: 1.01x faster
- xml_etree_process: 56.8 ms +- 0.5 ms -> 56.2 ms +- 0.5 ms: 1.01x faster
- unpickle_pure_python: 251 us +- 2 us -> 248 us +- 2 us: 1.01x faster
- scimark_monte_carlo: 72.3 ms +- 0.8 ms -> 71.6 ms +- 1.5 ms: 1.01x faster
- fannkuch: 396 ms +- 12 ms -> 393 ms +- 3 ms: 1.01x faster
- logging_format: 6.03 us +- 0.09 us -> 5.98 us +- 0.07 us: 1.01x faster
- xml_etree_generate: 79.7 ms +- 0.9 ms -> 79.1 ms +- 0.5 ms: 1.01x faster
- python_startup_no_site: 6.01 ms +- 0.00 ms -> 5.97 ms +- 0.00 ms: 1.01x faster
- regex_dna: 208 ms +- 1 ms -> 207 ms +- 1 ms: 1.01x faster
- sympy_expand: 497 ms +- 5 ms -> 494 ms +- 3 ms: 1.01x faster
- pathlib: 18.8 ms +- 0.2 ms -> 18.7 ms +- 0.2 ms: 1.01x faster
- xml_etree_iterparse: 106 ms +- 1 ms -> 105 ms +- 1 ms: 1.01x faster
- sympy_sum: 168 ms +- 1 ms -> 167 ms +- 2 ms: 1.01x faster
- python_startup: 8.70 ms +- 0.01 ms -> 8.64 ms +- 0.01 ms: 1.01x faster
- nqueens: 84.9 ms +- 0.5 ms -> 84.4 ms +- 1.0 ms: 1.01x faster
- regex_compile: 140 ms +- 1 ms -> 139 ms +- 1 ms: 1.01x faster
- pickle_pure_python: 335 us +- 3 us -> 333 us +- 2 us: 1.01x faster
- 2to3: 268 ms +- 1 ms -> 266 ms +- 1 ms: 1.01x faster
- dulwich_log: 66.1 ms +- 0.4 ms -> 65.7 ms +- 0.4 ms: 1.01x faster
- deltablue: 4.11 ms +- 0.04 ms -> 4.09 ms +- 0.04 ms: 1.01x faster
- sympy_integrate: 21.1 ms +- 0.1 ms -> 21.0 ms +- 0.2 ms: 1.00x faster
- sympy_str: 297 ms +- 3 ms -> 296 ms +- 3 ms: 1.00x faster

Benchmark hidden because not significant (14): html5lib, json, json_loads, meteor_contest, regex_effbot, regex_v8, scimark_lu, spectral_norm, sqlalchemy_declarative, sqlalchemy_imperative, sqlite_synth, tornado_http, unpickle, xml_etree_parse

Geometric mean: 1.01x faster

https://bugs.python.org/issue46702

sweeneyde · 2022-02-10T01:16:18Z

Is there interest in UNPACK_SEQUENCE_TWO_TUPLE__STORE_FAST__STORE_FAST? There was some discussion here:

Edit: I saw the comment on the issue -- makes sense.

markshannon · 2022-02-10T12:13:41Z

Would it make sense to specialize not for type, but for size in this case?
UNPACK_SEQUENCE_TWO would still need to test for tuples/lists, but would avoid the loop.
Maybe add UNPACK_SEQUENCE_THREE as well?

Something like

TARGET(UNPACK_SEQUENCE_TWO):
    PyObject *seq = TOP();
    PyObject **items;
    if (PyTuple_CheckExact(seq) &&
        PyTuple_GET_SIZE(seq) == 2) {
        items = ((PyTupleObject *)seq)->ob_item;
    } else if (PyList_CheckExact(seq) &&
               PyList_GET_SIZE(seq) == 2) {
        items = ((PyListObject *)seq)->ob_item;
    } else {
          goto general_unpack;
    }
    SET_TOP(Py_NewRef(items[1]));
    PUSH(Py_NewRef(items[0]));
    ...

p.s. Sorry about the merge conflict.

brandtbucher · 2022-02-10T18:21:23Z

Would it make sense to specialize not for type, but for size in this case? UNPACK_SEQUENCE_TWO would still need to test for tuples/lists, but would avoid the loop. Maybe add UNPACK_SEQUENCE_THREE as well?

Honestly, I can't remember the last time I unpacked a two-element list (or other non-tuple-iterable). Two-element tuples are just so deeply baked into the language that I have trouble seeing a payoff for adding the additional logic for handling everything else in the same instruction.

Three-element unpackings seem similarly rare (I can only think of os.walk and three-argument zip as possible common patterns).

I don't know, maybe they appear somewhat frequently in the benchmarks... but in that case, we should probably add specializations for lists and tuples of length ten. ;)

brandtbucher · 2022-02-11T19:42:27Z

Three-element unpackings seem similarly rare (I can only think of os.walk and three-argument zip as possible common patterns).

I don't know, maybe they appear somewhat frequently in the benchmarks...

Actually, it looks like nbody at least does lots of three-element list and tuple unpackings. I hadn't really considered 3D physics/graphics/etc...

#31254 should give us a better idea of the overall impact, at least. I'm still skeptical that specializing these will move the needle significantly vs. the two-element tuple case, so I'd like to explore and propose the less common patterns separately once these basic specializations are already in.

brandtbucher · 2022-02-14T20:21:42Z

The latest stats seem to suggest that this approach is the way to go for now. Then I'll try following up with UNPACK_SEQUENCE_TWO_TUPLE__STORE_FAST__STORE_FAST.

If I had to guess, I'd say that:

tuple n (17.7%) and list n (17.5%) are almost certainly just the unpack_sequence microbenchmark. The similar hit counts are a pretty big giveaway. In this case, n is a very unrealistic value of 10.
tuple 3 (8.1%) and list 3 (5.3%) are probably nbody.
tuple 4 (1.2%) is probably pidigits.

I'm much less inclined to specialize for each of these individual cases (at least right now). Perhaps the three-element case may be worth it at some point.

brandtbucher added 7 commits January 26, 2022 14:11

Specialize UNPACK_SEQUENCE

000ae2a

Simplify things a bit

5dab3be

Add UNPACK_SEQUENCE_TWO_TUPLE

502e537

Catch up with main

42a4676

Use NOTRACE_DISPATCH

fb1df0f

Clean things up a bit

5557efd

blurb add

d7c9fe1

brandtbucher requested a review from markshannon as a code owner February 10, 2022 00:41

the-knights-who-say-ni added the CLA signed label Feb 10, 2022

bedevere-bot added the awaiting core review label Feb 10, 2022

brandtbucher mentioned this pull request Feb 10, 2022

Specialize UNPACK_SEQUENCE faster-cpython/ideas#264

Closed

markshannon mentioned this pull request Feb 10, 2022

bpo-45923: Add RESUME_QUICK #31244

Merged

brandtbucher added 2 commits February 10, 2022 12:50

Catch up with main

94fbbba

Fixup

477e6ef

brandtbucher added 3 commits February 14, 2022 13:02

Catch up with main

079aea1

Add more stats

7d1944b

Don't classify failures when not collecting stats

c49b2fc

brandtbucher merged commit a9da085 into python:main Feb 16, 2022

bedevere-bot removed the awaiting core review label Feb 16, 2022

brandtbucher deleted the unpack-sequence branch July 21, 2022 20:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpo-46702: Specialize `UNPACK_SEQUENCE` #31240

bpo-46702: Specialize `UNPACK_SEQUENCE` #31240

brandtbucher commented Feb 10, 2022 •

edited by bedevere-bot

Loading

sweeneyde commented Feb 10, 2022 •

edited

Loading

markshannon commented Feb 10, 2022

brandtbucher commented Feb 10, 2022

brandtbucher commented Feb 11, 2022

brandtbucher commented Feb 14, 2022 •

edited

Loading

bpo-46702: Specialize UNPACK_SEQUENCE #31240

bpo-46702: Specialize UNPACK_SEQUENCE #31240

Conversation

brandtbucher commented Feb 10, 2022 • edited by bedevere-bot Loading

sweeneyde commented Feb 10, 2022 • edited Loading

markshannon commented Feb 10, 2022

brandtbucher commented Feb 10, 2022

brandtbucher commented Feb 11, 2022

brandtbucher commented Feb 14, 2022 • edited Loading

bpo-46702: Specialize `UNPACK_SEQUENCE` #31240

bpo-46702: Specialize `UNPACK_SEQUENCE` #31240

brandtbucher commented Feb 10, 2022 •

edited by bedevere-bot

Loading

sweeneyde commented Feb 10, 2022 •

edited

Loading

brandtbucher commented Feb 14, 2022 •

edited

Loading