Skip to content

bpo-43706: Use PEP 590 vectorcall to speed up enumerate() #25154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 21, 2021

Conversation

corona10
Copy link
Member

@corona10 corona10 commented Apr 2, 2021

https://bugs.python.org/issue43706

Benchmark base vectorcall
bench enumerate 533 ns 341 ns: 1.56x faster
Benchmark base (PGO+LTO) vectorcall (PGO+LTO)
bench enumerate 384 ns 277 ns: 1.39x faster

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not against such micro-optimization, but I'm not convinced that it's worth it compared to the size of the code to parse arguments: enum_vectorcall() code. If bpo-43447 is implemented, I would be more comfortable to accept such micro-optimization. Right now, it adds many lines of code that should be maintained manually.

The micro-benchmark is measure the creation of the enumerate object, it does not iterate it. I expect for that long sequence, the benefit is not significant. But for short sequence, it is more likely interesting.

@@ -80,6 +80,45 @@ enum_new_impl(PyTypeObject *type, PyObject *iterable, PyObject *start)
return (PyObject *)en;
}

// TODO: Use AC when bpo-43447 is supported
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to track these tasks in the bpo rather than directly in the Python source code.

@corona10
Copy link
Member Author

corona10 commented Apr 6, 2021

but I'm not convinced that it's worth it compared to the size of the code to parse arguments:

This is expected worry when I try to implement this. I also agree with you.
So l change my mind to update AC to support vectorcall rather than merge this PR directly.

@github-actions
Copy link

github-actions bot commented Jun 3, 2021

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Jun 3, 2021
@corona10 corona10 removed the stale Stale PR or inactive for long period of time. label Oct 21, 2021
@vstinner
Copy link
Member

Can you please re-run you benchmark? Maybe Python performance changed in the meanwhile.

@corona10
Copy link
Member Author

Can you please re-run you benchmark? Maybe Python performance changed in the meanwhile.

Same effect.

Benchmark base vectorcall
bench enumerate 533 ns 341 ns: 1.56x faster

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

bench enumerate | 533 ns | 341 ns: 1.56x faster is worth it.

@corona10 corona10 merged commit 83f202a into python:main Oct 21, 2021
@corona10 corona10 deleted the bpo-43706 branch October 21, 2021 23:20
JelleZijlstra added a commit to JelleZijlstra/cpython that referenced this pull request Jan 26, 2022
As mentioned in the issue, this fixes a regression in 3.11.

The regression was introduced in pythonGH-25154 (bpo-43706). There were already
comments there about how this was too much code for a simple change. This
makes it even worse.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants