Fix Series construction with dtype=str #20401

nikoskaragiannakis · 2018-03-18T16:20:47Z

TST: Added test for construction Series with dtype=str
BUG: Handles case where data is scalar
DOC: added changes to whatsnew/v0.23.0.txt

Checklist for other PRs (remove this part if you are doing a PR for the pandas documentation sprint):

closes BUG: invalid constrution of a Series with dtype=str #19853
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

codecov · 2018-03-18T18:49:25Z

Codecov Report

Merging #20401 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #20401      +/-   ##
==========================================
+ Coverage   91.77%   91.77%   +<.01%     
==========================================
  Files         152      152              
  Lines       49203    49214      +11     
==========================================
+ Hits        45155    45168      +13     
+ Misses       4048     4046       -2

Flag	Coverage Δ
#multiple	`90.16% <100%> (ø)`	⬆️
#single	`41.84% <100%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/series.py	`93.85% <100%> (ø)`	⬆️
pandas/util/testing.py	`84.11% <0%> (+0.37%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bd8a3cf...41d35ca. Read the comment docs.

jreback · 2018-03-19T10:35:10Z

pandas/tests/series/test_constructors.py

@@ -110,6 +110,11 @@ def test_constructor_empty(self, input_class):
            empty2 = Series(input_class(), index=lrange(10), dtype='float64')
            assert_series_equal(empty, empty2)

+            # GH 19853 : with empty string, index and dtype str
+            empty = Series('', dtype='str', index=range(3))
+            assert empty.all() == ''


construct the resultant series and use assert_series_equal

jreback · 2018-03-19T10:35:29Z

doc/source/whatsnew/v0.23.0.txt

@@ -714,6 +714,7 @@ Other API Changes
 - ``pd.to_datetime('today')`` now returns a datetime, consistent with ``pd.Timestamp('today')``; previously ``pd.to_datetime('today')`` returned a ``.normalized()`` datetime (:issue:`19935`)
 - :func:`Series.str.replace` now takes an optional `regex` keyword which, when set to ``False``, uses literal string replacement rather than regex replacement (:issue:`16808`)
 - :func:`DatetimeIndex.strftime` and :func:`PeriodIndex.strftime` now return an ``Index`` instead of a numpy array to be consistent with similar accessors (:issue:`20127`)
+``Series`` construction with a ``string``, ``dtype=str`` specified, and ``index`` specified will now return an ``object`` dtyped ``Series``, previously this would raise an AttributeError (:issue:`19853`)


move to Bug / reshaping.

jreback · 2018-03-19T10:36:44Z

doc/source/whatsnew/v0.23.0.txt

@@ -714,6 +714,7 @@ Other API Changes
 - ``pd.to_datetime('today')`` now returns a datetime, consistent with ``pd.Timestamp('today')``; previously ``pd.to_datetime('today')`` returned a ``.normalized()`` datetime (:issue:`19935`)
 - :func:`Series.str.replace` now takes an optional `regex` keyword which, when set to ``False``, uses literal string replacement rather than regex replacement (:issue:`16808`)
 - :func:`DatetimeIndex.strftime` and :func:`PeriodIndex.strftime` now return an ``Index`` instead of a numpy array to be consistent with similar accessors (:issue:`20127`)
+``Series`` construction with a ``string``, ``dtype=str`` specified, and ``index`` specified will now return an ``object`` dtyped ``Series``, previously this would raise an AttributeError (:issue:`19853`)


referencde :class:`Series`

you can simplify this, just say construction with dtype=str previously raised in some cases.

jreback · 2018-03-19T10:38:25Z

pandas/core/series.py

@@ -4059,9 +4059,14 @@ def _try_cast(arr, take_fast_path):
    if issubclass(subarr.dtype.type, compat.string_types):
        # GH 16605
        # If not empty convert the data to dtype
-        if not isna(data).all():


you don't need all this, just change to use np.any

In [3]: np.any(pd.isna('')) Out[3]: False

jreback · 2018-03-19T23:51:02Z

pandas/core/series.py

-
-        subarr = np.array(data, dtype=object, copy=copy)
+        # GH 19853: If data is a scalar, subarr has already the result
+        if not np.isscalar(data):


right but is this still an extra call here? do we need the scalar check? (and should be is_scalar anyhow if its needed)

We need a check indeed. That was the problem from the beginning.
If it is scalar, subarr has already the correct result.
I'll change it to is_scalar

nikoskaragiannakis · 2018-03-22T13:21:14Z

@jreback any more comments here?

jreback · 2018-03-25T14:34:44Z

pandas/core/series.py

-
-        subarr = np.array(data, dtype=object, copy=copy)
+        # GH 19853: If data is a scalar, subarr has already the result
+        if not is_scalar(data):


is this additional check actually needed? (is_scalar)

Yes. If it is scalar, no other change is needed, since subarr has the correct result. If it is not scalar, then it does the conversion

data = np.array(data, dtype=dtype, copy=False)

(That conversion was there already)

my point is that do you actually need to add this check? when you take it out (but fix the np.any) is there any problem?

If I take out, then this https://github.com/pandas-dev/pandas/pull/20401/files#diff-3bbe4551f20de6060dce38a95d0adc80R114 will raise

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ../core/series.py:270: in __init__ data = SingleBlockManager(data, index, fastpath=True) ../core/internals.py:4632: in __init__ block = make_block(block, placement=slice(0, len(axis)), ndim=1) ../core/internals.py:3161: in make_block return klass(values, ndim=ndim, placement=placement) ../core/internals.py:2268: in __init__ placement=placement) ../core/internals.py:117: in __init__ self.ndim = self._check_ndim(values, ndim) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <[AttributeError("ndim") raised in repr()] ObjectBlock object at 0x7f1ae266da98> values = array('', dtype=object), ndim = 1 def _check_ndim(self, values, ndim): """ndim inference and validation. Infers ndim from 'values' if not provided to __init__. Validates that values.ndim and ndim are consistent if and only if the class variable '_validate_ndim' is True. Parameters ---------- values : array-like ndim : int or None Returns ------- ndim : int Raises ------ ValueError : the number of dimensions do not match """ if ndim is None: ndim = values.ndim if self._validate_ndim and values.ndim != ndim: msg = ("Wrong number of dimensions. values.ndim != ndim " "[{} != {}]") > raise ValueError(msg.format(values.ndim, ndim)) E ValueError: Wrong number of dimensions. values.ndim != ndim [0 != 1] ../core/internals.py:153: ValueError

jreback · 2018-03-30T20:09:32Z

thanks @nikoskaragiannakis

nikoskaragiannakis added 3 commits March 18, 2018 16:15

TST: Added test for construction Series with dtype=str

9569eb3

BUG: Handles case where data is scalar

a188cf7

DOC: added changes to whatsnew/v0.23.0.txt

1844e6e

nikoskaragiannakis changed the title ~~Fix series construction str~~ Fix Series construction with dtype=str Mar 18, 2018

jreback requested changes Mar 19, 2018

View reviewed changes

jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Dtype Conversions Unexpected or buggy dtype conversions Bug labels Mar 19, 2018

nikoskaragiannakis added 4 commits March 19, 2018 22:59

BUG: simplification

55f9998

TST: better testing

d6aac90

DOC: better documentation

ba2d2c0

Merge branch 'master' into fix_series_construction_str

81fde6e

jreback requested changes Mar 19, 2018

View reviewed changes

jreback added this to the 0.23.0 milestone Mar 19, 2018

BUG: use is_scalar instead of np.isscalar

41d35ca

TomAugspurger approved these changes Mar 20, 2018

View reviewed changes

jreback requested changes Mar 25, 2018

View reviewed changes

jreback approved these changes Mar 30, 2018

View reviewed changes

jreback merged commit 77d5ea0 into pandas-dev:master Mar 30, 2018

kornilova203 pushed a commit to kornilova203/pandas that referenced this pull request Apr 23, 2018

Fix Series construction with dtype=str (pandas-dev#20401)

dccb4f6

jreback mentioned this pull request May 31, 2018

Potential regression in str dtype handling in 0.23? #21270

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix Series construction with dtype=str #20401

Fix Series construction with dtype=str #20401

Uh oh!

nikoskaragiannakis commented Mar 18, 2018 •

edited

Loading

Uh oh!

codecov bot commented Mar 18, 2018 •

edited

Loading

Uh oh!

jreback Mar 19, 2018

Uh oh!

jreback Mar 19, 2018

Uh oh!

jreback Mar 19, 2018

Uh oh!

jreback Mar 19, 2018

Uh oh!

jreback Mar 19, 2018

Uh oh!

nikoskaragiannakis Mar 20, 2018

Uh oh!

nikoskaragiannakis commented Mar 22, 2018

Uh oh!

jreback Mar 25, 2018

Uh oh!

nikoskaragiannakis Mar 25, 2018

Uh oh!

jreback Mar 25, 2018

Uh oh!

nikoskaragiannakis Mar 26, 2018

Uh oh!

jreback commented Mar 30, 2018

Uh oh!

Uh oh!

Uh oh!

Fix Series construction with dtype=str #20401

Fix Series construction with dtype=str #20401

Uh oh!

Conversation

nikoskaragiannakis commented Mar 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikoskaragiannakis commented Mar 22, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Mar 30, 2018

Uh oh!

Uh oh!

nikoskaragiannakis commented Mar 18, 2018 •

edited

Loading

codecov bot commented Mar 18, 2018 •

edited

Loading