Skip to content

gh-118750: Asymptotically faster int(string) #118751

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 112 commits into from
May 19, 2024
Merged

gh-118750: Asymptotically faster int(string) #118751

merged 112 commits into from
May 19, 2024

Conversation

tim-one
Copy link
Member

@tim-one tim-one commented May 8, 2024

Adding new, but unsued, _dec_str_to_int_inner(), + discussion.

@tim-one tim-one linked an issue May 8, 2024 that may be closed by this pull request
Co-authored-by: Jelle Zijlstra <[email protected]>
@nineteendo
Copy link
Contributor

Skip news?

tim-one and others added 2 commits May 8, 2024 12:25
Co-authored-by: sstandre <[email protected]>
Co-authored-by: sstandre <[email protected]>
@tim-one
Copy link
Member Author

tim-one commented May 8, 2024

Skip news?

Premature, I think. It's possible that a later version may become actively used.

@nineteendo
Copy link
Contributor

We only need a news entry for user facing changes. When the faster integer conversion is eventually implemented, you'll be credited.

@tim-one
Copy link
Member Author

tim-one commented May 8, 2024

Yes, I understand that. I'm saying that it's possible this will become a "user-facing" change soon. I already have a newer version (not yet checked in) that's much better. Mucking with labels is, IMO, a minor waste of time at this point.

@nineteendo
Copy link
Contributor

Sorry, I'm just trying to help. I thought you didn't mean in the near future.

@tim-one
Copy link
Member Author

tim-one commented May 8, 2024

No problem! Things change: I originally thought this was far from being ready, but that's been changing rapidly.

tim-one added 4 commits May 8, 2024 15:09
… did very much better than I recalled. So moving to that instead. The crossover point is "only" about 3.4 million digits now, far smaller.
@tim-one
Copy link
Member Author

tim-one commented May 8, 2024

A cute mystery: I noticed that converting a string representing an int a little smaller than a power of 256 took about 3x longer than one a little larger than that power. Turns out there's "a reason" for that: in the latter case, the n inner() sees is frequently 0, so it cuts back to a mere 8 digits of precision to compute the integer quotient. In the former case, n is usually chock full o' 1 bits, so the precision is only cut in half.

tim-one added 5 commits May 8, 2024 18:37
Add a int<->str test for a truly large int (10 million digits),
which isn't currently tested. Bur regrtest will skip it unless
the "cpu" resource is enabled (e.g., via "-ucpu" on the cmdline).
@python python deleted a comment from bedevere-bot May 19, 2024
@python python deleted a comment from bedevere-bot May 19, 2024
@python python deleted a comment from bedevere-bot May 19, 2024
@python python deleted a comment from bedevere-bot May 19, 2024
@python python deleted a comment from bedevere-bot May 19, 2024
@python python deleted a comment from bedevere-bot May 19, 2024
@python python deleted a comment from bedevere-bot May 19, 2024
@python python deleted a comment from bedevere-bot May 19, 2024
@python python deleted a comment from bedevere-bot May 19, 2024
@python python deleted a comment from bedevere-bot May 19, 2024
@python python deleted a comment from bedevere-bot May 19, 2024
@python python deleted a comment from bedevere-bot May 19, 2024
@python python deleted a comment from bedevere-bot May 19, 2024
@python python deleted a comment from bedevere-bot May 19, 2024
@python python deleted a comment from bedevere-bot May 19, 2024
@python python deleted a comment from bedevere-bot May 19, 2024
@python python deleted a comment from bedevere-bot May 19, 2024
@python python deleted a comment from bedevere-bot May 19, 2024
@python python deleted a comment from bedevere-bot May 19, 2024
@python python deleted a comment from bedevere-bot May 19, 2024
@python python deleted a comment from bedevere-bot May 19, 2024
@python python deleted a comment from bedevere-bot May 19, 2024
@python python deleted a comment from bedevere-bot May 19, 2024
@python python deleted a comment from bedevere-bot May 19, 2024
@python python deleted a comment from bedevere-bot May 19, 2024
@python python deleted a comment from bedevere-bot May 19, 2024
@python python deleted a comment from bedevere-bot May 19, 2024
@python python deleted a comment from bedevere-bot May 19, 2024
@python python deleted a comment from bedevere-bot May 19, 2024
estyxx pushed a commit to estyxx/cpython that referenced this pull request Jul 17, 2024
Asymptotically faster (O(n log n)) str->int for very large strings, leveraging the faster multiplication scheme in the C-coded `_decimal` when available. This is used instead of the current Karatsuba-limited method starting at 2 million digits.

Lots of opportunity remains for fine-tuning. Good targets include changing BYTELIM, and possibly changing the internal output base (from 256 to a higher number of bytes).

Doing this was substantial work, and many of the new lines are actually comments giving correctness proofs. The obvious approaches sticking to integers were too slow to be useful, so this is doing variable-precision decimal floating-point arithmetic. Much faster, but worst-possible rounding errors have to be wholly accounted for, using as little precision as possible.

Special thanks to Serhiy Storchaka for asking many good questions in his code reviews!

Co-authored-by: Jelle Zijlstra <[email protected]>
Co-authored-by: sstandre <[email protected]>
Co-authored-by: Pieter Eendebak <[email protected]>
Co-authored-by: Nice Zombies <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Asymptotically faster int(string)
9 participants