Skip to content

BUG: Interchange protocol uses u for string format code but offets are 8 bytes #56754

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
WillAyd opened this issue Jan 6, 2024 · 1 comment
Closed
3 tasks done
Labels
Bug Interchange Dataframe Interchange Protocol Strings String extension data type and string data

Comments

@WillAyd
Copy link
Member

WillAyd commented Jan 6, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

When inspecting the offsets buffer of a string sent via the interchange protocol the buffer uses 8 bytes per entry to store an offset even when the format code is "u"

LargeString should use 8 bytes per entry, but has a different format code of "U"

Issue Description

see above

Expected Behavior

u should have 4 byte offsets, U should have 8

Installed Versions

main

@WillAyd WillAyd added Bug Interchange Dataframe Interchange Protocol labels Jan 6, 2024
@rhshadrach rhshadrach added the Strings String extension data type and string data label Jan 7, 2024
@MarcoGorelli
Copy link
Member

closed by #56772

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Interchange Dataframe Interchange Protocol Strings String extension data type and string data
Projects
None yet
Development

No branches or pull requests

3 participants