gh-91247: improve performance of list and tuple repeat #32045

eendebakpt · 2022-03-22T12:43:54Z

Improve the performance of the list repeat methods by reducing the number of reference count operations and copying data using memcpy. The approach to reduce the number of memcpy invocations is similar to #31999, but due to the handling of reference counts in the list and tuple repeat methods the code is slightly different.

Note: the specialization for the case of 1 item in a list or tuple can be removed with (almost) no loss of performance. More details and a comparison against the version with specializations (#91482) are the issue #91247.

https://bugs.python.org/issue47091

eendebakpt · 2022-03-28T21:23:48Z

Microbenchmark (main against commit 8b2cc9c):

list(3) repeat 1: Mean +- std dev: [base] 142 ns +- 4 ns -> [pr2] 136 ns +- 2 ns: 1.04x faster
list(1) repeat 1: Mean +- std dev: [base] 59.0 ns +- 0.7 ns -> [pr2] 55.9 ns +- 2.1 ns: 1.05x faster
list(3) repeat inplace 1: Mean +- std dev: [base] 74.5 ns +- 0.5 ns -> [pr2] 72.4 ns +- 0.8 ns: 1.03x faster
tuple(4) repeat 1: Mean +- std dev: [base] 46.6 ns +- 1.1 ns -> [pr2] 48.0 ns +- 0.8 ns: 1.03x slower
list(100) repeat 2: Mean +- std dev: [base] 1.28 us +- 0.00 us -> [pr2] 1.26 us +- 0.01 us: 1.02x faster
list(3) repeat 2: Mean +- std dev: [base] 156 ns +- 3 ns -> [pr2] 146 ns +- 3 ns: 1.07x faster
list(3) repeat inplace 2: Mean +- std dev: [base] 106 ns +- 1 ns -> [pr2] 105 ns +- 3 ns: 1.01x faster
tuple(4) repeat 2: Mean +- std dev: [base] 146 ns +- 0 ns -> [pr2] 141 ns +- 2 ns: 1.03x faster
list(100) repeat 10: Mean +- std dev: [base] 4.35 us +- 0.05 us -> [pr2] 4.23 us +- 0.02 us: 1.03x faster
list(3) repeat 10: Mean +- std dev: [base] 267 ns +- 3 ns -> [pr2] 192 ns +- 5 ns: 1.39x faster
list(1) repeat 10: Mean +- std dev: [base] 68.8 ns +- 2.4 ns -> [pr2] 75.7 ns +- 1.8 ns: 1.10x slower
list(3) repeat inplace 10: Mean +- std dev: [base] 143 ns +- 1 ns -> [pr2] 129 ns +- 3 ns: 1.10x faster
tuple(4) repeat 10: Mean +- std dev: [base] 240 ns +- 3 ns -> [pr2] 269 ns +- 3 ns: 1.12x slower
list(100) repeat 1000: Mean +- std dev: [base] 409 us +- 1 us -> [pr2] 421 us +- 1 us: 1.03x slower
list(3) repeat 1000: Mean +- std dev: [base] 18.8 us +- 0.2 us -> [pr2] 5.42 us +- 0.28 us: 3.48x faster
list(1) repeat 1000: Mean +- std dev: [base] 2.01 us +- 0.02 us -> [pr2] 1.96 us +- 0.02 us: 1.03x faster
list(3) repeat inplace 1000: Mean +- std dev: [base] 4.93 us +- 0.16 us -> [pr2] 2.60 us +- 0.02 us: 1.90x faster
tuple(4) repeat 1000: Mean +- std dev: [base] 9.80 us +- 0.29 us -> [pr2] 6.98 us +- 0.47 us: 1.40x faster

Benchmark hidden because not significant (3): list(100) repeat 1, list(1) repeat 2, [control] list pop+append

Geometric mean: 1.14x faster

Code for benchmark

``` import pyperf runner = pyperf.Runner()

setup='a=[1,2,3]; a1=[1,]; t=(1,2,3,4)'

for n in [1,2,10,1000]:
runner.timeit(name=f"list(100) repeat {n}",
stmt=f"x=a*{n}; y=a*{n}",
setup=f'a=[1.]*100')

runner.timeit(name=f"list(3) repeat {n}",
          stmt=f"x=a*{n}; y=a*{n}",
          setup=setup)

runner.timeit(name=f"list(1) repeat {n}",
          stmt=f"x=a1*{n};",
          setup=setup)

runner.timeit(name=f"list(3) repeat inplace {n}",
          stmt=f"a=[1,2,None]; a*={n}",
          setup=setup)
          
runner.timeit(name=f"tuple(4) repeat {n}",
          stmt=f"x=t*{n}; y=t*{n}",
          setup=setup)

runner.timeit(name=f"[control] list pop+append",
stmt=f"x=a.pop(); a.append(1)",
setup=setup)

</details>

ghost · 2022-04-10T19:37:01Z

Commit authors are required to sign the Contributor License Agreement.

MaxwellDupre

414 tests OK.
1 test failed:
test_embed
I don't think this is related, hence with most test passing looks ok.

eendebakpt · 2022-04-28T18:51:48Z

@MaxwellDupre Thanks for the approval. Note that this PR was marked draft. There are two PRs to resolve #91247, this one and #91482.

I have a small preference for #91482 (but this PR is also an improvement)

eendebakpt · 2022-05-11T08:23:51Z

Closing as #91482 seems a better approach

bedevere-bot added the awaiting review label Mar 22, 2022

the-knights-who-say-ni added the CLA signed label Mar 22, 2022

eendebakpt marked this pull request as draft March 22, 2022 14:02

eendebakpt mentioned this pull request Mar 23, 2022

bpo-47070: improve performance of array_inplace_repeat #31999

Merged

eendebakpt and others added 6 commits March 28, 2022 13:00

use memcpy in list repat and list inplace repeat

99a3f68

use memcpy in tuple repeat

c5740e8

fix debug build

ce0b547

remove double ref counting

c3ce05c

📜🤖 Added by blurb_it.

7d27c22

make implementations of list repeat and tuple repeat similar

109526a

eendebakpt force-pushed the performance/list_repeat_v2 branch from 64f74f8 to 109526a Compare March 28, 2022 11:00

eliminate duplicated code

c021e38

eendebakpt marked this pull request as ready for review March 28, 2022 11:31

eendebakpt marked this pull request as draft March 28, 2022 12:29

fix ci

05c2dfd

eendebakpt marked this pull request as ready for review March 28, 2022 13:27

eendebakpt added 2 commits March 28, 2022 22:39

no special case for n=1

74f828f

formatting

8b2cc9c

eendebakpt added 3 commits March 30, 2022 07:25

Merge branch 'main' into performance/list_repeat_v2

efbfb62

Merge branch 'main' into performance/list_repeat_v2

28fd5d6

Merge branch 'main' into performance/list_repeat_v2

4caef2e

eendebakpt marked this pull request as draft April 10, 2022 21:18

eendebakpt changed the title ~~bpo-47091: improve performance of list and tuple repeat~~ gh-91247: improve performance of list and tuple repeat Apr 12, 2022

This was referenced Apr 12, 2022

gh-91247: improve performance of list and tuple repeat (with specialization for n=1) #91482

Merged

Improve performance of list and tuple repeat methods #91247

Closed

MaxwellDupre approved these changes Apr 28, 2022

View reviewed changes

bedevere-bot added awaiting core review and removed awaiting review labels Apr 28, 2022

eendebakpt closed this May 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-91247: improve performance of list and tuple repeat #32045

gh-91247: improve performance of list and tuple repeat #32045

Uh oh!

eendebakpt commented Mar 22, 2022 •

edited

Loading

Uh oh!

eendebakpt commented Mar 28, 2022

Uh oh!

ghost commented Apr 10, 2022

Uh oh!

MaxwellDupre left a comment

Uh oh!

eendebakpt commented Apr 28, 2022

Uh oh!

eendebakpt commented May 11, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

gh-91247: improve performance of list and tuple repeat #32045

gh-91247: improve performance of list and tuple repeat #32045

Uh oh!

Conversation

eendebakpt commented Mar 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eendebakpt commented Mar 28, 2022

Uh oh!

ghost commented Apr 10, 2022

Uh oh!

MaxwellDupre left a comment

Choose a reason for hiding this comment

Uh oh!

eendebakpt commented Apr 28, 2022

Uh oh!

eendebakpt commented May 11, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

eendebakpt commented Mar 22, 2022 •

edited

Loading