-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: from_records returns dtypes respecting input numpy dtypes #55081
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
In pandas the
See https://pandas.pydata.org/docs/user_guide/text.html#text-types for more information about how text types are handled. |
@hedeershowk Thanks for replying. I know we can set the Edit: |
so you're proposing that pandas will convert a numpy U10 type into an |
@hedeershowk It seems Pandas doesn't support all numpy structured arrays but it does support some like >>> df['name'] = df['name'].astype("S10")
>>> df.dtypes
name |S10
age int16
weight float32
dtype: object What I proposed is that Pandas should try to use the numpy dtypes for string like It would be better if Pandas support all numpy structured array types so that the |
👍 Okay makes sense. I think that's definitely a contributor question and not for me. Seems that current |
This is intentional, I think. I think prior discussion can be found here #10351. |
I read the discussion, and it seems the main argument was this comment: #10351 (comment), which raised a question that if a column has a dtype like >>> data = np.array([(3, 'a'), (2, 'b'), (1, 'c'), (0, 'd')],dtype=[('col_1', 'i4'), ('col_2', 'U1')])
>>> df = pd.DataFrame.from_records(data)
>>> df.dtypes
col_1 int32
col_2 object
dtype: object
>>> df.col_2 = df.col_2.astype('S1')
>>> df.dtypes
col_1 int32
col_2 |S1
dtype: object
>>> df.iloc[0,1] = 'Some other string that is large'
>>> df.dtypes
col_1 int32
col_2 object
dtype: object
>>> type(df.iloc[1,1])
bytes As you can see, the current pandas version already converts the column dtype from I am very unfamiliar with the pandas internals but it seems pandas has implemented a very flexible model in the past several years, so that it is not a problem any more to have 'mixed' types in one column and the column type seems to be assigned with the most compatible one. With that information, I don't think there is actually any reason that pandas cannot use the provided numpy fixed-length string dtypes as the column types. At least for the |
This is supposed to be object, never fixed width strings from NumPy, they are not supported in pandas and we are moving towards arrow strings anyway, so this won't get support in pandas itself closing |
Uh oh!
There was an error while loading. Please reload this page.
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
When creating a DataFrame using
from_records
with aNumpy
structured array
, the current implementation doesn't respect the latter'sdtypes
.It seems to me that the
integer
andfloat
types are respected from some quick tests, but thestr
types are not.Feature Description
Alternative Solutions
The end users can just do a post-processing similar to above.
Additional Context
No response
The text was updated successfully, but these errors were encountered: