Skip to content

Conversation

@eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented Apr 12, 2022

Compared to current main branch there are 3 improvements

  • For the list inplace repeat there is an optimized copy method
  • This PR solves the inefficient pointer chasing for small (size 2 to 7) lists and tuples
  • It shares code between the implementations of list repeat, tuple repeat and list inplace repeat

This PR keeps the specializations for repeats of lists and tuples with length 1. An alternative PR without specialization for n=1 (#32045) was rejected because the specializations might yield a speedup.

For performance comparisons to main and the version without specializations and more detailed discussion see issue #91247.

The inefficient pointer chasing can be seen in the following performance benchmark:

pc

(results obtained with runner.timeit(name=f"list({n}) repeat {r}", stmt=f"x=a*{r}", setup=f'a=list(range({n}))'))

@eendebakpt eendebakpt marked this pull request as draft April 12, 2022 19:14
@eendebakpt eendebakpt marked this pull request as ready for review April 12, 2022 21:26
@eendebakpt eendebakpt force-pushed the performance/list_repeat_v2_specialized branch from 05e2b70 to 5445cfb Compare April 21, 2022 18:20
@eendebakpt
Copy link
Contributor Author

@sweeneyde Could you have a look at this ticket? The approach is similar to #31856 and #31999 which you reviewed.

@eendebakpt eendebakpt force-pushed the performance/list_repeat_v2_specialized branch from 1491705 to 66cc8ae Compare May 11, 2022 08:39
@sweeneyde sweeneyde requested review from sweeneyde and vstinner May 24, 2022 12:28
@sweeneyde
Copy link
Member

@eendebakpt Sorry this slipped under my radar

@vstinner, would you mind reviewing? I think the change is good in principle but I want another reviewer and to make sure the header file changes are appropriate.

Copy link
Contributor

@kumaraditya303 kumaraditya303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits

Copy link
Member

@sweeneyde sweeneyde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor nits, but this is looking good. Thanks for your patience and persistence on this.

@sweeneyde
Copy link
Member

I verified a benchmark on a Windows machine since this might be a bit compiler-specific.

pyperf timeit -s "a = [0,1]" "a * 10_000"
Mean +- std dev: [before] 32.9 us +- 0.3 us -> [after] 21.8 us +- 0.3 us: 1.51x faster

Looks great!

@sweeneyde sweeneyde added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 25, 2022
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @sweeneyde for commit 516f091 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 25, 2022
*dest++ = *src++;
}

_Py_memory_repeat((char *)np->ob_item, sizeof(PyObject *)*output_size,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: this cannot overflow because list_new_prealloc uses PyMem_New, and so this multiplication already succeeded at some point.

@sweeneyde sweeneyde merged commit 2ef73be into python:main Jul 26, 2022
@eendebakpt eendebakpt deleted the performance/list_repeat_v2_specialized branch July 26, 2022 08:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants