-
-
Notifications
You must be signed in to change notification settings - Fork 8
make getitem return str #55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It occurs to me that another way to handle this that avoids copying would be to make |
I looked at making @seberg besides it being awkward that you won't be able to go from a scalar you get back from |
Dunno, I don't mind round-tripping not working in general, at that point there may not even be a point of not using For CPython you could reach into the internals and construct the unicode to avoid all overheads. Here most of the overhead may actually be calling through Python, but in general the copy and object creation would matter. |
Yeah, effectively this is re-implementing |
* buffer from val_obj to the StringScalar we'd like to return. In | ||
* principle we could avoid this by making a C function like | ||
* PyUnicode_FromStringAndSize that fills a StringScalar instead of a | ||
* str. For now (4-11-23) we are punting on that with the expectation that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* str. For now (4-11-23) we are punting on that with the expectation that | |
* str. For now (2023-04-11) we are punting on that with the expectation that |
just joking, but ISO dates all the way!
This is half-way at the place where str
is the scalar. But lets try how it goes!
This makes getitem return a
str
instead ofStringScalar
. This is substantially faster. Before:After:
Here's how I created
arr
in the above test:The speed difference is due to avoiding an additional unnecessary copy from the string returned by
PyUnicode_FromString
to theStringScalar
instance. In principle I could still return aStringScalar
and retain this speed, but I think I'd effectively need to re-implement the low-level string filling implementation thatPyUnicode_FromString
uses, and sinceStringScalar
is a trivial subclass that only exists because there's already a scalar in numpy for the pythonstr
builtin, that seems not worth the trouble.Happy to hear about alternate ideas.