Skip to content

Conversation

@eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented Mar 22, 2022

Improve the performance of the list repeat methods by reducing the number of reference count operations and copying data using memcpy. The approach to reduce the number of memcpy invocations is similar to #31999, but due to the handling of reference counts in the list and tuple repeat methods the code is slightly different.

Note: the specialization for the case of 1 item in a list or tuple can be removed with (almost) no loss of performance. More details and a comparison against the version with specializations (#91482) are the issue #91247.

https://bugs.python.org/issue47091

@eendebakpt eendebakpt force-pushed the performance/list_repeat_v2 branch from 64f74f8 to 109526a Compare March 28, 2022 11:00
@eendebakpt eendebakpt marked this pull request as ready for review March 28, 2022 11:31
@eendebakpt eendebakpt marked this pull request as draft March 28, 2022 12:29
@eendebakpt eendebakpt marked this pull request as ready for review March 28, 2022 13:27
@eendebakpt
Copy link
Contributor Author

Microbenchmark (main against commit 8b2cc9c):

list(3) repeat 1: Mean +- std dev: [base] 142 ns +- 4 ns -> [pr2] 136 ns +- 2 ns: 1.04x faster
list(1) repeat 1: Mean +- std dev: [base] 59.0 ns +- 0.7 ns -> [pr2] 55.9 ns +- 2.1 ns: 1.05x faster
list(3) repeat inplace 1: Mean +- std dev: [base] 74.5 ns +- 0.5 ns -> [pr2] 72.4 ns +- 0.8 ns: 1.03x faster
tuple(4) repeat 1: Mean +- std dev: [base] 46.6 ns +- 1.1 ns -> [pr2] 48.0 ns +- 0.8 ns: 1.03x slower
list(100) repeat 2: Mean +- std dev: [base] 1.28 us +- 0.00 us -> [pr2] 1.26 us +- 0.01 us: 1.02x faster
list(3) repeat 2: Mean +- std dev: [base] 156 ns +- 3 ns -> [pr2] 146 ns +- 3 ns: 1.07x faster
list(3) repeat inplace 2: Mean +- std dev: [base] 106 ns +- 1 ns -> [pr2] 105 ns +- 3 ns: 1.01x faster
tuple(4) repeat 2: Mean +- std dev: [base] 146 ns +- 0 ns -> [pr2] 141 ns +- 2 ns: 1.03x faster
list(100) repeat 10: Mean +- std dev: [base] 4.35 us +- 0.05 us -> [pr2] 4.23 us +- 0.02 us: 1.03x faster
list(3) repeat 10: Mean +- std dev: [base] 267 ns +- 3 ns -> [pr2] 192 ns +- 5 ns: 1.39x faster
list(1) repeat 10: Mean +- std dev: [base] 68.8 ns +- 2.4 ns -> [pr2] 75.7 ns +- 1.8 ns: 1.10x slower
list(3) repeat inplace 10: Mean +- std dev: [base] 143 ns +- 1 ns -> [pr2] 129 ns +- 3 ns: 1.10x faster
tuple(4) repeat 10: Mean +- std dev: [base] 240 ns +- 3 ns -> [pr2] 269 ns +- 3 ns: 1.12x slower
list(100) repeat 1000: Mean +- std dev: [base] 409 us +- 1 us -> [pr2] 421 us +- 1 us: 1.03x slower
list(3) repeat 1000: Mean +- std dev: [base] 18.8 us +- 0.2 us -> [pr2] 5.42 us +- 0.28 us: 3.48x faster
list(1) repeat 1000: Mean +- std dev: [base] 2.01 us +- 0.02 us -> [pr2] 1.96 us +- 0.02 us: 1.03x faster
list(3) repeat inplace 1000: Mean +- std dev: [base] 4.93 us +- 0.16 us -> [pr2] 2.60 us +- 0.02 us: 1.90x faster
tuple(4) repeat 1000: Mean +- std dev: [base] 9.80 us +- 0.29 us -> [pr2] 6.98 us +- 0.47 us: 1.40x faster

Benchmark hidden because not significant (3): list(100) repeat 1, list(1) repeat 2, [control] list pop+append

Geometric mean: 1.14x faster
Code for benchmark ``` import pyperf runner = pyperf.Runner()

setup='a=[1,2,3]; a1=[1,]; t=(1,2,3,4)'

for n in [1,2,10,1000]:
runner.timeit(name=f"list(100) repeat {n}",
stmt=f"x=a*{n}; y=a*{n}",
setup=f'a=[1.]*100')

runner.timeit(name=f"list(3) repeat {n}",
          stmt=f"x=a*{n}; y=a*{n}",
          setup=setup)

runner.timeit(name=f"list(1) repeat {n}",
          stmt=f"x=a1*{n};",
          setup=setup)

runner.timeit(name=f"list(3) repeat inplace {n}",
          stmt=f"a=[1,2,None]; a*={n}",
          setup=setup)
          
runner.timeit(name=f"tuple(4) repeat {n}",
          stmt=f"x=t*{n}; y=t*{n}",
          setup=setup)

runner.timeit(name=f"[control] list pop+append",
stmt=f"x=a.pop(); a.append(1)",
setup=setup)

</details>

@ghost
Copy link

ghost commented Apr 10, 2022

Commit authors are required to sign the Contributor License Agreement.
CLA not signed

@eendebakpt eendebakpt marked this pull request as draft April 10, 2022 21:18
@eendebakpt eendebakpt changed the title bpo-47091: improve performance of list and tuple repeat gh-91247: improve performance of list and tuple repeat Apr 12, 2022
Copy link
Contributor

@MaxwellDupre MaxwellDupre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

414 tests OK.
1 test failed:
test_embed
I don't think this is related, hence with most test passing looks ok.

@eendebakpt
Copy link
Contributor Author

@MaxwellDupre Thanks for the approval. Note that this PR was marked draft. There are two PRs to resolve #91247, this one and #91482.

I have a small preference for #91482 (but this PR is also an improvement)

@eendebakpt
Copy link
Contributor Author

Closing as #91482 seems a better approach

@eendebakpt eendebakpt closed this May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants