-
-
Notifications
You must be signed in to change notification settings - Fork 31.7k
base64.b85encode uses significant amount of RAM #101178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I can confirm, that this still happens on
Compare it to almost-instant
|
I looked into the implementation of
The poor performance (compared to b64encode) is mainly due to the part To make any significant gains an implementation in C seems the most worthwhile. |
Add a base85 encoder and decoder to `binascii` and four new functions, `binascii.a2b_ascii85()`, `a2b_base85()`, `b2a_ascii85()`, and `b2a_base85()`. These can be used to replace the existing pure-Python base85 implementation within `base64.a85encode()`, `b85encode()`, `a85decode()`, and `b85encode()`. Performance and memory usage both benefit by at least an order of magnitude. No API or documentation changes are necessary with respect to `base64.a85encode()`, `b85encode()`, etc., and all existing unit tests for those functions continue to pass without modification. The only significant observable behavioral difference compared to the old base85 implementation should be that attempting to decode Ascii85 or base85 data of length 1 mod 5 (after accounting for Ascii85 quirks) now results in an error. Such data would never be produced by any base85 encoder, so it must be invalid. Resolves: pythongh-101178
Add a base85 encoder and decoder to `binascii` and four new functions, `binascii.a2b_ascii85()`, `a2b_base85()`, `b2a_ascii85()`, and `b2a_base85()`. These are used to replace the existing pure-Python base85 implementation within `base64.a85encode()`, `b85encode()`, `a85decode()`, and `b85encode()`. Performance and memory usage both benefit by at least an order of magnitude. No API or documentation changes are necessary with respect to `base64.a85encode()`, `b85encode()`, etc., and all existing unit tests for those functions continue to pass without modification. Note that attempting to decode Ascii85 or base85 data of length 1 mod 5 (after accounting for Ascii85 quirks) now produces an error. No base85 encoder would emit such data, so it must be invalid. This should be the only notable external-facing difference in behavior compared to the old implementation. Resolves: pythongh-101178
Add Ascii85 and base85 encoders and decoders to `binascii` and four new functions, `binascii.a2b_ascii85()`, `a2b_base85()`, `b2a_ascii85()`, and `b2a_base85()`. These replace the existing implementations in `base64.a85encode()`, `b85encode()`, `a85decode()`, and `b85encode()`. Performance is greatly improved, and memory usage is now constant instead of linear. No API or documentation changes are necessary with respect to `base64.a85encode()`, `b85encode()`, etc., and all existing unit tests for those functions continue to pass without modification. Note that attempting to decode Ascii85 or base85 data of length 1 mod 5 (after accounting for Ascii85 quirks) now produces an error, as no encoder would emit such data. This should be the only significant externally visible difference compared to the old implementation. Resolves: pythongh-101178
Refactor code to make use of generators instead of allocating 2 potentially huge lists for large datasets
Refactor code to make use of generators instead of allocating 2 potentially huge lists for large datasets
Refactor code to make use of generators instead of allocating 2 potentially huge lists for large datasets
Add Ascii85, base85, and Z85 encoders and decoders to `binascii`, replacing the existing pure Python implementations in `base64`. No API or documentation changes are necessary with respect to `base64.a85encode()`, `b85encode()`, etc., and all existing unit tests for those functions continue to pass without modification. Note that attempting to decode Ascii85 or base85 data of length 1 mod 5 (after accounting for Ascii85 quirks) now produces an error, as no encoder would emit such data. This should be the only significant externally visible difference compared to the old implementation. Resolves: pythongh-101178
Pinging this issue in hopes of finding a reviewer for my PR gh-102753. It's been over two years 😅 |
Thanks for the patience! I pinged the cpython triage team to have a look. |
Bug report
On the same string:
b85encode takes up my entire RAM and crashes
On IPython:
Here is the GNU time stats:
I have gotten same results in Python 3.6, 3.7, 3.8, 3.9
Your environment
Linked PRs
The text was updated successfully, but these errors were encountered: