-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Implement DataFrame interchange protocol #46141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
49 commits
Select commit
Hold shift + click to select a range
ac58967
Vendor smoke tests from consortium
vnlitvinov fce881e
Vendor dataframe_protocol spec
vnlitvinov 02946f8
Copy over the prototype and polish it a bit
vnlitvinov 14fd478
Fix the protocol spec
vnlitvinov 4515011
Enable pd.DataFrame.__dataframe__
vnlitvinov 7d6fd5b
Align spec with existing implementations
vnlitvinov 5d64c4a
Fix protocol tests
vnlitvinov b36fd46
Make DataFrame.__dataframe__ pass protocol tests
vnlitvinov d334b20
Explicitly mark abstract methods in spec
vnlitvinov 014165d
Add more smoke tests
vnlitvinov def54ba
Implement column chunking
vnlitvinov 8e6b882
Fix tests formatting
vnlitvinov 282c85d
Start implementing chunk support in from_df
vnlitvinov 9fbb58d
Test buffer contents if on CPU
vnlitvinov dd93625
Improve spec a bit
vnlitvinov 07c8fae
Beautify spec whitespace
vnlitvinov b74c06e
Use constants from spec enums, beautify a bit
vnlitvinov 6637a29
Format by black
vnlitvinov 0883406
Format exchange tests by black
vnlitvinov 49418d2
Respond to review - move files around
vnlitvinov 78aebaa
Separate buffer and column implementations
vnlitvinov 1b64ae2
Mimick what Modin did
vnlitvinov 870ad21
Make spec tests pass
vnlitvinov edefc8f
Add tests for dtype_to_arrow_c_fmt
vnlitvinov 7144cf2
Fix test declarations, some impl bugs remain
vnlitvinov 0dc1e58
Fix .describe_categoricals and some tests
vnlitvinov 0f7c654
Auto-fix some pre-commit checks
vnlitvinov 522a66a
Fix more issues found by commit checks
vnlitvinov 1525320
Fix categorical-related test failures
vnlitvinov 0054c15
Add a whatsnew entry
vnlitvinov f8badc6
Fix rst linting
vnlitvinov 86005d4
Fix DataFrame.__dataframe__ docstring
vnlitvinov 9ab797b
Fix DataFrame.__dataframe__ docstring more
vnlitvinov 7a54b20
Fix test_api::TestApi
vnlitvinov 65a5370
Try to fix typecheck issues
vnlitvinov 594ac53
Respond to review comments
vnlitvinov 62c43af
Fix mypy error
vnlitvinov cacc9f1
Change check for dlpack
vnlitvinov 804aa89
Address review comments
vnlitvinov d1c0d56
Remove dead elif branch
vnlitvinov 5d98ebf
Fix tests broken by .column_names change
vnlitvinov 60379e5
Add tests for datetime dtype
vnlitvinov 497ca24
Fix from_dataframe docstring
vnlitvinov 39f5a5c
Add tests for uint dtype
vnlitvinov d73558a
Handle string dtype better
vnlitvinov 4ed35bf
Add test for mixed object dtype
vnlitvinov 2fca3c0
Rename spec test for clarity
vnlitvinov f030d9f
Add missing test cases in test_dtype_to_arrow_c_fmt
vnlitvinov cc94e57
Add comments explaing magic dtype numbers
vnlitvinov File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
""" public toolkit API """ | ||
from pandas.api import ( # noqa:F401 | ||
exchange, | ||
extensions, | ||
indexers, | ||
types, | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
""" | ||
Public API for DataFrame exchange protocol. | ||
""" | ||
|
||
from pandas.core.exchange.dataframe_protocol import DataFrame | ||
from pandas.core.exchange.from_dataframe import from_dataframe | ||
|
||
__all__ = ["from_dataframe", "DataFrame"] |
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
from typing import ( | ||
Optional, | ||
Tuple, | ||
) | ||
|
||
import numpy as np | ||
from packaging import version | ||
|
||
from pandas.core.exchange.dataframe_protocol import ( | ||
Buffer, | ||
DlpackDeviceType, | ||
) | ||
|
||
_NUMPY_HAS_DLPACK = version.parse(np.__version__) >= version.parse("1.22.0") | ||
|
||
|
||
class PandasBuffer(Buffer): | ||
""" | ||
Data in the buffer is guaranteed to be contiguous in memory. | ||
""" | ||
|
||
def __init__(self, x: np.ndarray, allow_copy: bool = True) -> None: | ||
""" | ||
Handle only regular columns (= numpy arrays) for now. | ||
""" | ||
if not x.strides == (x.dtype.itemsize,): | ||
# The protocol does not support strided buffers, so a copy is | ||
# necessary. If that's not allowed, we need to raise an exception. | ||
if allow_copy: | ||
x = x.copy() | ||
else: | ||
raise RuntimeError( | ||
"Exports cannot be zero-copy in the case " | ||
"of a non-contiguous buffer" | ||
) | ||
|
||
# Store the numpy array in which the data resides as a private | ||
# attribute, so we can use it to retrieve the public attributes | ||
self._x = x | ||
|
||
@property | ||
def bufsize(self) -> int: | ||
""" | ||
Buffer size in bytes. | ||
""" | ||
return self._x.size * self._x.dtype.itemsize | ||
|
||
@property | ||
def ptr(self) -> int: | ||
""" | ||
Pointer to start of the buffer as an integer. | ||
""" | ||
return self._x.__array_interface__["data"][0] | ||
|
||
def __dlpack__(self): | ||
""" | ||
Represent this structure as DLPack interface. | ||
""" | ||
if _NUMPY_HAS_DLPACK: | ||
return self._x.__dlpack__() | ||
raise NotImplementedError("__dlpack__") | ||
|
||
def __dlpack_device__(self) -> Tuple[DlpackDeviceType, Optional[int]]: | ||
""" | ||
Device type and device ID for where the data in the buffer resides. | ||
""" | ||
return (DlpackDeviceType.CPU, None) | ||
|
||
def __repr__(self) -> str: | ||
return ( | ||
"PandasBuffer(" | ||
+ str( | ||
{ | ||
"bufsize": self.bufsize, | ||
"ptr": self.ptr, | ||
"device": self.__dlpack_device__()[0].name, | ||
} | ||
) | ||
+ ")" | ||
) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.