-
-
Notifications
You must be signed in to change notification settings - Fork 32k
gh-118750: Asymptotically faster int(string)
#118751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: Jelle Zijlstra <[email protected]>
Skip news? |
Co-authored-by: sstandre <[email protected]>
Co-authored-by: sstandre <[email protected]>
Premature, I think. It's possible that a later version may become actively used. |
We only need a news entry for user facing changes. When the faster integer conversion is eventually implemented, you'll be credited. |
Yes, I understand that. I'm saying that it's possible this will become a "user-facing" change soon. I already have a newer version (not yet checked in) that's much better. Mucking with labels is, IMO, a minor waste of time at this point. |
Sorry, I'm just trying to help. I thought you didn't mean in the near future. |
No problem! Things change: I originally thought this was far from being ready, but that's been changing rapidly. |
… did very much better than I recalled. So moving to that instead. The crossover point is "only" about 3.4 million digits now, far smaller.
A cute mystery: I noticed that converting a string representing an int a little smaller than a power of 256 took about 3x longer than one a little larger than that power. Turns out there's "a reason" for that: in the latter case, the |
Add a int<->str test for a truly large int (10 million digits), which isn't currently tested. Bur regrtest will skip it unless the "cpu" resource is enabled (e.g., via "-ucpu" on the cmdline).
Asymptotically faster (O(n log n)) str->int for very large strings, leveraging the faster multiplication scheme in the C-coded `_decimal` when available. This is used instead of the current Karatsuba-limited method starting at 2 million digits. Lots of opportunity remains for fine-tuning. Good targets include changing BYTELIM, and possibly changing the internal output base (from 256 to a higher number of bytes). Doing this was substantial work, and many of the new lines are actually comments giving correctness proofs. The obvious approaches sticking to integers were too slow to be useful, so this is doing variable-precision decimal floating-point arithmetic. Much faster, but worst-possible rounding errors have to be wholly accounted for, using as little precision as possible. Special thanks to Serhiy Storchaka for asking many good questions in his code reviews! Co-authored-by: Jelle Zijlstra <[email protected]> Co-authored-by: sstandre <[email protected]> Co-authored-by: Pieter Eendebak <[email protected]> Co-authored-by: Nice Zombies <[email protected]>
Adding new, but unsued,
_dec_str_to_int_inner()
, + discussion.int(string)
#118750