Closed
Description
Currently, the interchange fails with large-string
type:
import pyarrow as pa
arr = ["foo", "bar"]
table = pa.table(
{"arr": pa.array(arr, 'large_string')}
)
exchange_df = table.__dataframe__()
from pandas.core.interchange.from_dataframe import from_dataframe
from_dataframe(exchange_df)
I get
Traceback (most recent call last):
File "t.py", line 30, in <module>
from_dataframe(exchange_df)
File "/home/marcogorelli/pandas-dev/pandas/core/interchange/from_dataframe.py", line 52, in from_dataframe
return _from_dataframe(df.__dataframe__(allow_copy=allow_copy))
File "/home/marcogorelli/pandas-dev/pandas/core/interchange/from_dataframe.py", line 73, in _from_dataframe
pandas_df = protocol_df_chunk_to_pandas(chunk)
File "/home/marcogorelli/pandas-dev/pandas/core/interchange/from_dataframe.py", line 125, in protocol_df_chunk_to_pandas
columns[name], buf = string_column_to_ndarray(col)
File "/home/marcogorelli/pandas-dev/pandas/core/interchange/from_dataframe.py", line 242, in string_column_to_ndarray
assert protocol_data_dtype[1] == 8 # bitwidth == 8
AssertionError
This is an issue when interchanging from polars, which uses large-string
: pola-rs/polars#8377
What should be done in this case, where do we go from here?
Note that if I try adding large-string
to the ArrowCTypes
in pandas, then it "just works", but that's probably not the solution?