Skip to content

Interchage protocol - large-string? #150

Closed
@MarcoGorelli

Description

@MarcoGorelli

Currently, the interchange fails with large-string type:

import pyarrow as pa

arr = ["foo", "bar"]
table = pa.table(
    {"arr": pa.array(arr, 'large_string')}
)
exchange_df = table.__dataframe__()

from pandas.core.interchange.from_dataframe import from_dataframe
from_dataframe(exchange_df)

I get

Traceback (most recent call last):
  File "t.py", line 30, in <module>
    from_dataframe(exchange_df)
  File "/home/marcogorelli/pandas-dev/pandas/core/interchange/from_dataframe.py", line 52, in from_dataframe
    return _from_dataframe(df.__dataframe__(allow_copy=allow_copy))
  File "/home/marcogorelli/pandas-dev/pandas/core/interchange/from_dataframe.py", line 73, in _from_dataframe
    pandas_df = protocol_df_chunk_to_pandas(chunk)
  File "/home/marcogorelli/pandas-dev/pandas/core/interchange/from_dataframe.py", line 125, in protocol_df_chunk_to_pandas
    columns[name], buf = string_column_to_ndarray(col)
  File "/home/marcogorelli/pandas-dev/pandas/core/interchange/from_dataframe.py", line 242, in string_column_to_ndarray
    assert protocol_data_dtype[1] == 8  # bitwidth == 8
AssertionError

This is an issue when interchanging from polars, which uses large-string: pola-rs/polars#8377

What should be done in this case, where do we go from here?

Note that if I try adding large-string to the ArrowCTypes in pandas, then it "just works", but that's probably not the solution?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions