Skip to content

bpo-46702: Specialize UNPACK_SEQUENCE #31240

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Feb 16, 2022

Conversation

brandtbucher
Copy link
Member

@brandtbucher brandtbucher commented Feb 10, 2022

99% hit rate, with 2/3rds of the hits going to UNPACK_SEQUENCE_TWO_TUPLE.

1% improvement overall:

Slower (11):
- crypto_pyaes: 84.3 ms +- 0.5 ms -> 86.5 ms +- 0.9 ms: 1.03x slower
- nbody: 95.8 ms +- 2.7 ms -> 97.9 ms +- 3.3 ms: 1.02x slower
- json_dumps: 12.1 ms +- 0.1 ms -> 12.3 ms +- 0.1 ms: 1.02x slower
- richards: 50.9 ms +- 1.6 ms -> 51.8 ms +- 1.3 ms: 1.02x slower
- pycparser: 1.15 sec +- 0.02 sec -> 1.17 sec +- 0.02 sec: 1.02x slower
- telco: 6.46 ms +- 0.13 ms -> 6.57 ms +- 0.10 ms: 1.02x slower
- pyflate: 448 ms +- 3 ms -> 455 ms +- 3 ms: 1.01x slower
- pickle: 9.99 us +- 0.11 us -> 10.1 us +- 0.4 us: 1.01x slower
- logging_simple: 5.46 us +- 0.08 us -> 5.51 us +- 0.08 us: 1.01x slower
- django_template: 34.9 ms +- 0.3 ms -> 35.1 ms +- 0.5 ms: 1.01x slower
- thrift: 807 us +- 11 us -> 812 us +- 14 us: 1.01x slower

Faster (37):
- unpack_sequence: 51.4 ns +- 0.5 ns -> 47.6 ns +- 0.8 ns: 1.08x faster
- pidigits: 202 ms +- 0 ms -> 188 ms +- 0 ms: 1.07x faster
- pickle_list: 4.72 us +- 0.06 us -> 4.44 us +- 0.04 us: 1.06x faster
- logging_silent: 114 ns +- 0 ns -> 108 ns +- 1 ns: 1.06x faster
- chaos: 75.2 ms +- 0.9 ms -> 72.4 ms +- 0.8 ms: 1.04x faster
- hexiom: 6.86 ms +- 0.03 ms -> 6.68 ms +- 0.06 ms: 1.03x faster
- scimark_sparse_mat_mult: 5.07 ms +- 0.12 ms -> 4.94 ms +- 0.18 ms: 1.03x faster
- chameleon: 6.95 ms +- 0.12 ms -> 6.80 ms +- 0.08 ms: 1.02x faster
- scimark_fft: 346 ms +- 3 ms -> 339 ms +- 3 ms: 1.02x faster
- raytrace: 322 ms +- 3 ms -> 316 ms +- 4 ms: 1.02x faster
- unpickle_list: 5.09 us +- 0.08 us -> 4.99 us +- 0.05 us: 1.02x faster
- mako: 10.6 ms +- 0.1 ms -> 10.4 ms +- 0.1 ms: 1.02x faster
- pickle_dict: 28.9 us +- 0.1 us -> 28.3 us +- 0.1 us: 1.02x faster
- float: 79.7 ms +- 0.9 ms -> 78.5 ms +- 1.0 ms: 1.02x faster
- scimark_sor: 127 ms +- 1 ms -> 126 ms +- 1 ms: 1.01x faster
- go: 147 ms +- 2 ms -> 145 ms +- 1 ms: 1.01x faster
- xml_etree_process: 56.8 ms +- 0.5 ms -> 56.2 ms +- 0.5 ms: 1.01x faster
- unpickle_pure_python: 251 us +- 2 us -> 248 us +- 2 us: 1.01x faster
- scimark_monte_carlo: 72.3 ms +- 0.8 ms -> 71.6 ms +- 1.5 ms: 1.01x faster
- fannkuch: 396 ms +- 12 ms -> 393 ms +- 3 ms: 1.01x faster
- logging_format: 6.03 us +- 0.09 us -> 5.98 us +- 0.07 us: 1.01x faster
- xml_etree_generate: 79.7 ms +- 0.9 ms -> 79.1 ms +- 0.5 ms: 1.01x faster
- python_startup_no_site: 6.01 ms +- 0.00 ms -> 5.97 ms +- 0.00 ms: 1.01x faster
- regex_dna: 208 ms +- 1 ms -> 207 ms +- 1 ms: 1.01x faster
- sympy_expand: 497 ms +- 5 ms -> 494 ms +- 3 ms: 1.01x faster
- pathlib: 18.8 ms +- 0.2 ms -> 18.7 ms +- 0.2 ms: 1.01x faster
- xml_etree_iterparse: 106 ms +- 1 ms -> 105 ms +- 1 ms: 1.01x faster
- sympy_sum: 168 ms +- 1 ms -> 167 ms +- 2 ms: 1.01x faster
- python_startup: 8.70 ms +- 0.01 ms -> 8.64 ms +- 0.01 ms: 1.01x faster
- nqueens: 84.9 ms +- 0.5 ms -> 84.4 ms +- 1.0 ms: 1.01x faster
- regex_compile: 140 ms +- 1 ms -> 139 ms +- 1 ms: 1.01x faster
- pickle_pure_python: 335 us +- 3 us -> 333 us +- 2 us: 1.01x faster
- 2to3: 268 ms +- 1 ms -> 266 ms +- 1 ms: 1.01x faster
- dulwich_log: 66.1 ms +- 0.4 ms -> 65.7 ms +- 0.4 ms: 1.01x faster
- deltablue: 4.11 ms +- 0.04 ms -> 4.09 ms +- 0.04 ms: 1.01x faster
- sympy_integrate: 21.1 ms +- 0.1 ms -> 21.0 ms +- 0.2 ms: 1.00x faster
- sympy_str: 297 ms +- 3 ms -> 296 ms +- 3 ms: 1.00x faster

Benchmark hidden because not significant (14): html5lib, json, json_loads, meteor_contest, regex_effbot, regex_v8, scimark_lu, spectral_norm, sqlalchemy_declarative, sqlalchemy_imperative, sqlite_synth, tornado_http, unpickle, xml_etree_parse

Geometric mean: 1.01x faster

https://bugs.python.org/issue46702

@sweeneyde
Copy link
Member

sweeneyde commented Feb 10, 2022

Is there interest in UNPACK_SEQUENCE_TWO_TUPLE__STORE_FAST__STORE_FAST? There was some discussion here:

Edit: I saw the comment on the issue -- makes sense.

@markshannon
Copy link
Member

Would it make sense to specialize not for type, but for size in this case?
UNPACK_SEQUENCE_TWO would still need to test for tuples/lists, but would avoid the loop.
Maybe add UNPACK_SEQUENCE_THREE as well?

Something like

TARGET(UNPACK_SEQUENCE_TWO):
    PyObject *seq = TOP();
    PyObject **items;
    if (PyTuple_CheckExact(seq) &&
        PyTuple_GET_SIZE(seq) == 2) {
        items = ((PyTupleObject *)seq)->ob_item;
    } else if (PyList_CheckExact(seq) &&
               PyList_GET_SIZE(seq) == 2) {
        items = ((PyListObject *)seq)->ob_item;
    } else {
          goto general_unpack;
    }
    SET_TOP(Py_NewRef(items[1]));
    PUSH(Py_NewRef(items[0]));
    ...

p.s. Sorry about the merge conflict.

@brandtbucher
Copy link
Member Author

Would it make sense to specialize not for type, but for size in this case? UNPACK_SEQUENCE_TWO would still need to test for tuples/lists, but would avoid the loop. Maybe add UNPACK_SEQUENCE_THREE as well?

Honestly, I can't remember the last time I unpacked a two-element list (or other non-tuple-iterable). Two-element tuples are just so deeply baked into the language that I have trouble seeing a payoff for adding the additional logic for handling everything else in the same instruction.

Three-element unpackings seem similarly rare (I can only think of os.walk and three-argument zip as possible common patterns).

I don't know, maybe they appear somewhat frequently in the benchmarks... but in that case, we should probably add specializations for lists and tuples of length ten. ;)

@brandtbucher
Copy link
Member Author

Three-element unpackings seem similarly rare (I can only think of os.walk and three-argument zip as possible common patterns).

I don't know, maybe they appear somewhat frequently in the benchmarks...

Actually, it looks like nbody at least does lots of three-element list and tuple unpackings. I hadn't really considered 3D physics/graphics/etc...

#31254 should give us a better idea of the overall impact, at least. I'm still skeptical that specializing these will move the needle significantly vs. the two-element tuple case, so I'd like to explore and propose the less common patterns separately once these basic specializations are already in.

@brandtbucher
Copy link
Member Author

brandtbucher commented Feb 14, 2022

The latest stats seem to suggest that this approach is the way to go for now. Then I'll try following up with UNPACK_SEQUENCE_TWO_TUPLE__STORE_FAST__STORE_FAST.

If I had to guess, I'd say that:

  • tuple n (17.7%) and list n (17.5%) are almost certainly just the unpack_sequence microbenchmark. The similar hit counts are a pretty big giveaway. In this case, n is a very unrealistic value of 10.
  • tuple 3 (8.1%) and list 3 (5.3%) are probably nbody.
  • tuple 4 (1.2%) is probably pidigits.

I'm much less inclined to specialize for each of these individual cases (at least right now). Perhaps the three-element case may be worth it at some point.

@brandtbucher brandtbucher merged commit a9da085 into python:main Feb 16, 2022
@brandtbucher brandtbucher deleted the unpack-sequence branch July 21, 2022 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants