-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PERF: Improve pickle support with BZ2 & LZMA #49068
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
49 commits
Select commit
Hold shift + click to select a range
b3e1bc5
Add `BZ2File` wrapper for pickle protocol 5
jakirkham 17f725b
Add `LZMAFile` wrapper for pickle protocol 5
jakirkham 280731e
Use BZ2 & LZMA wrappers for full pickle support
jakirkham ccda94e
Workaround linter issue
jakirkham 08c37e5
Refactor out `flatten_buffer`
jakirkham 8109338
Refactor `B2File` into separate module
jakirkham 3c498bd
Merge pandas-dev/main into jakirkham/fix_pickle5
jakirkham 691eba7
Test `flatten_buffer`
jakirkham 7a93b70
Move `flatten_buffer` to `_utils`
jakirkham 8f5b0a1
Import `annotations` to fix `|` usage
jakirkham b5ce67c
Merge pandas-dev/main into jakirkham/fix_pickle5
jakirkham c54529a
Sort `import`s to fix lint
jakirkham 7604d48
Patch `BZ2File` & `LZMAFile` on Python pre-3.10
jakirkham 9f3d387
Test C & F contiguous NumPy arrays
jakirkham 16f21a5
Test `memoryview` is 1-D `uint8` contiguous data
jakirkham 6df7e08
Run `black` on `bz2` and `lzma` compat files
jakirkham 39ffab0
One more lint fix
jakirkham f134dee
Drop unused `PickleBuffer` `import`s
jakirkham a7126a2
Simplify change to `panda.compat.__init__`
jakirkham df7b0ce
Type `flatten_buffer` result
jakirkham 742788f
Use `order="A"` in `memoryview.tobytes(...)`
jakirkham ffc58d3
Move all compat compressors into a single file
jakirkham 0b1be16
Fix `BZ2File` `import`
jakirkham 5a6ea45
Merge pandas-dev/main into jakirkham/fix_pickle5
jakirkham f00740c
Refactor out common compat constants
jakirkham 06e5387
Fix `import` sorting
jakirkham 269dc0f
Drop unused `import`
jakirkham f1f1a2e
Ignore `flake8` errors on wildcard `import`
jakirkham f73f0a5
Revert "Ignore `flake8` errors on wildcard `import`"
jakirkham b8a724b
Explicitly `import` all constants
jakirkham 1b11188
Assign `IS64` first
jakirkham d2a39db
Try `noqa` on wildcard `import` again
jakirkham 01e8604
Declare `BZ2File` & `LZMAFile` once
jakirkham 523e20c
`import PickleBuffer` for simplicity
jakirkham 6f7e293
Add `bytearray` to return type
jakirkham 4614bd7
Test `bytes` & `bytearray` are returned unaltered
jakirkham 818e08d
Merge branch 'main' into fix_pickle5
jakirkham f33ed7a
Merge pandas-dev/main into jakirkham/fix_pickle5
jakirkham cf4f926
Explicit list all constants
jakirkham 4a2efad
Trick linter into thinking constants are used ;)
jakirkham b18a3f0
Merge pandas-dev/main into jakirkham/fix_pickle5
jakirkham 0dae476
Add new entry to 2.0.0
jakirkham 366f645
Assign constants to themselves
jakirkham 092e726
Update changelog entry [skip ci]
jakirkham 03b8eac
Merge pandas-dev/main into jakirkham/fix_pickle5
jakirkham e49ba4f
Add constants to `__all__`
jakirkham 453b4e3
Update changelog entry [ci skip]
jakirkham 30124dd
Use Sphinx method annotation
jakirkham 72aeff2
Merge pandas-dev/main into jakirkham/fix_pickle5
jakirkham File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
""" | ||
Patched ``BZ2File`` and ``LZMAFile`` to handle pickle protocol 5. | ||
""" | ||
|
||
from __future__ import annotations | ||
|
||
import bz2 | ||
from pickle import PickleBuffer | ||
|
||
from pandas.compat._constants import PY310 | ||
|
||
try: | ||
import lzma | ||
|
||
has_lzma = True | ||
except ImportError: | ||
has_lzma = False | ||
|
||
|
||
def flatten_buffer( | ||
b: bytes | bytearray | memoryview | PickleBuffer, | ||
) -> bytes | bytearray | memoryview: | ||
""" | ||
Return some 1-D `uint8` typed buffer. | ||
|
||
Coerces anything that does not match that description to one that does | ||
without copying if possible (otherwise will copy). | ||
""" | ||
|
||
if isinstance(b, (bytes, bytearray)): | ||
return b | ||
|
||
if not isinstance(b, PickleBuffer): | ||
b = PickleBuffer(b) | ||
|
||
try: | ||
# coerce to 1-D `uint8` C-contiguous `memoryview` zero-copy | ||
return b.raw() | ||
except BufferError: | ||
# perform in-memory copy if buffer is not contiguous | ||
return memoryview(b).tobytes("A") | ||
|
||
|
||
class BZ2File(bz2.BZ2File): | ||
if not PY310: | ||
|
||
def write(self, b) -> int: | ||
# Workaround issue where `bz2.BZ2File` expects `len` | ||
# to return the number of bytes in `b` by converting | ||
# `b` into something that meets that constraint with | ||
# minimal copying. | ||
# | ||
# Note: This is fixed in Python 3.10. | ||
return super().write(flatten_buffer(b)) | ||
|
||
|
||
if has_lzma: | ||
|
||
class LZMAFile(lzma.LZMAFile): | ||
if not PY310: | ||
|
||
def write(self, b) -> int: | ||
# Workaround issue where `lzma.LZMAFile` expects `len` | ||
# to return the number of bytes in `b` by converting | ||
# `b` into something that meets that constraint with | ||
# minimal copying. | ||
# | ||
# Note: This is fixed in Python 3.10. | ||
return super().write(flatten_buffer(b)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
""" | ||
_constants | ||
====== | ||
|
||
Constants relevant for the Python implementation. | ||
""" | ||
|
||
from __future__ import annotations | ||
|
||
import platform | ||
import sys | ||
|
||
IS64 = sys.maxsize > 2**32 | ||
|
||
PY39 = sys.version_info >= (3, 9) | ||
PY310 = sys.version_info >= (3, 10) | ||
PY311 = sys.version_info >= (3, 11) | ||
PYPY = platform.python_implementation() == "PyPy" | ||
|
||
|
||
__all__ = [ | ||
"IS64", | ||
"PY39", | ||
"PY310", | ||
"PY311", | ||
"PYPY", | ||
] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.