Skip to content

BUG: cast to str_ should not convert to pure-python intermediate #9978

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 7, 2017

Conversation

ahaldane
Copy link
Member

@ahaldane ahaldane commented Nov 6, 2017

Small fix for casting to string arrays. See See pandas-dev/pandas#18123.

When casting to a string (S) array numpy currently does dst[i] = str(src[i].item()) where recall that .item() converts to a pure-python type. I think it should just do dst[i] = str(src[i]).

This comes up for floating types now that we updated their str, since now str(np.floatXX(...)) is not the same as str(np.floatXX(...).item(), especially in python2.

Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I thought I already did a search for this kind of problem, but I guess @name@_getitem slipped through the cracks.

@eric-wieser
Copy link
Member

Test fails due to some platforms not having f16 dtypes

@ahaldane
Copy link
Member Author

ahaldane commented Nov 6, 2017

Oh right.. then I will just remove the f16 test.

# str(np.float64(...).item()) as this would incorrectly truncate.
a = np.zeros(1, dtype='S20')
a[:] = np.array(['1.12345678901234567890'], dtype='f8')
assert_equal(a[0], b"1.1234567890123457")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add another test for S10 too?

It would be good to record what happens when the float str is too long. I'm not sure whether letting the string truncate is OK, or if it should round.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also might be tidier to use a 0d array here and .astype, but up to you

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all types (not just floats) there is a fairly well established precedent that strings get simply truncated if the dtype is too short, so I don't think we should round. Integers don't round.

Also, I'm a bit hesitant to add a test for S10 because I didn't touch any string truncation code here, and getting the truncation right seems like a separate issue, to be examined and discussed on its own. This PR's purpose is to test string casting intermediates, not to test the arguably unrelated string truncation.

@charris charris merged commit 7aeb3ae into numpy:master Nov 7, 2017
@charris
Copy link
Member

charris commented Nov 7, 2017

Thanks Allan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants