-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Fix Series construction with dtype=str #20401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Series construction with dtype=str #20401
Conversation
Codecov Report
@@ Coverage Diff @@
## master #20401 +/- ##
==========================================
+ Coverage 91.77% 91.77% +<.01%
==========================================
Files 152 152
Lines 49203 49214 +11
==========================================
+ Hits 45155 45168 +13
+ Misses 4048 4046 -2
Continue to review full report at Codecov.
|
@@ -110,6 +110,11 @@ def test_constructor_empty(self, input_class): | |||
empty2 = Series(input_class(), index=lrange(10), dtype='float64') | |||
assert_series_equal(empty, empty2) | |||
|
|||
# GH 19853 : with empty string, index and dtype str | |||
empty = Series('', dtype='str', index=range(3)) | |||
assert empty.all() == '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
construct the resultant series and use assert_series_equal
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -714,6 +714,7 @@ Other API Changes | |||
- ``pd.to_datetime('today')`` now returns a datetime, consistent with ``pd.Timestamp('today')``; previously ``pd.to_datetime('today')`` returned a ``.normalized()`` datetime (:issue:`19935`) | |||
- :func:`Series.str.replace` now takes an optional `regex` keyword which, when set to ``False``, uses literal string replacement rather than regex replacement (:issue:`16808`) | |||
- :func:`DatetimeIndex.strftime` and :func:`PeriodIndex.strftime` now return an ``Index`` instead of a numpy array to be consistent with similar accessors (:issue:`20127`) | |||
``Series`` construction with a ``string``, ``dtype=str`` specified, and ``index`` specified will now return an ``object`` dtyped ``Series``, previously this would raise an AttributeError (:issue:`19853`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to Bug / reshaping.
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -714,6 +714,7 @@ Other API Changes | |||
- ``pd.to_datetime('today')`` now returns a datetime, consistent with ``pd.Timestamp('today')``; previously ``pd.to_datetime('today')`` returned a ``.normalized()`` datetime (:issue:`19935`) | |||
- :func:`Series.str.replace` now takes an optional `regex` keyword which, when set to ``False``, uses literal string replacement rather than regex replacement (:issue:`16808`) | |||
- :func:`DatetimeIndex.strftime` and :func:`PeriodIndex.strftime` now return an ``Index`` instead of a numpy array to be consistent with similar accessors (:issue:`20127`) | |||
``Series`` construction with a ``string``, ``dtype=str`` specified, and ``index`` specified will now return an ``object`` dtyped ``Series``, previously this would raise an AttributeError (:issue:`19853`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
referencde :class:`Series`
you can simplify this, just say construction with dtype=str
previously raised in some cases.
@@ -4059,9 +4059,14 @@ def _try_cast(arr, take_fast_path): | |||
if issubclass(subarr.dtype.type, compat.string_types): | |||
# GH 16605 | |||
# If not empty convert the data to dtype | |||
if not isna(data).all(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need all this, just change to use np.any
In [3]: np.any(pd.isna(''))
Out[3]: False
pandas/core/series.py
Outdated
|
||
subarr = np.array(data, dtype=object, copy=copy) | ||
# GH 19853: If data is a scalar, subarr has already the result | ||
if not np.isscalar(data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right but is this still an extra call here? do we need the scalar check? (and should be is_scalar anyhow if its needed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a check indeed. That was the problem from the beginning.
If it is scalar, subarr
has already the correct result.
I'll change it to is_scalar
@jreback any more comments here? |
|
||
subarr = np.array(data, dtype=object, copy=copy) | ||
# GH 19853: If data is a scalar, subarr has already the result | ||
if not is_scalar(data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this additional check actually needed? (is_scalar)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. If it is scalar, no other change is needed, since subarr
has the correct result. If it is not scalar, then it does the conversion
data = np.array(data, dtype=dtype, copy=False)
(That conversion was there already)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my point is that do you actually need to add this check? when you take it out (but fix the np.any
) is there any problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I take out, then this https://github.com/pandas-dev/pandas/pull/20401/files#diff-3bbe4551f20de6060dce38a95d0adc80R114 will raise
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../core/series.py:270: in __init__
data = SingleBlockManager(data, index, fastpath=True)
../core/internals.py:4632: in __init__
block = make_block(block, placement=slice(0, len(axis)), ndim=1)
../core/internals.py:3161: in make_block
return klass(values, ndim=ndim, placement=placement)
../core/internals.py:2268: in __init__
placement=placement)
../core/internals.py:117: in __init__
self.ndim = self._check_ndim(values, ndim)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <[AttributeError("ndim") raised in repr()] ObjectBlock object at 0x7f1ae266da98>
values = array('', dtype=object), ndim = 1
def _check_ndim(self, values, ndim):
"""ndim inference and validation.
Infers ndim from 'values' if not provided to __init__.
Validates that values.ndim and ndim are consistent if and only if
the class variable '_validate_ndim' is True.
Parameters
----------
values : array-like
ndim : int or None
Returns
-------
ndim : int
Raises
------
ValueError : the number of dimensions do not match
"""
if ndim is None:
ndim = values.ndim
if self._validate_ndim and values.ndim != ndim:
msg = ("Wrong number of dimensions. values.ndim != ndim "
"[{} != {}]")
> raise ValueError(msg.format(values.ndim, ndim))
E ValueError: Wrong number of dimensions. values.ndim != ndim [0 != 1]
../core/internals.py:153: ValueError
thanks @nikoskaragiannakis |
TST: Added test for construction Series with dtype=str
BUG: Handles case where data is scalar
DOC: added changes to whatsnew/v0.23.0.txt
Checklist for other PRs (remove this part if you are doing a PR for the pandas documentation sprint):
git diff upstream/master -u -- "*.py" | flake8 --diff