Skip to content

Fix Series construction with dtype=str #20401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

nikoskaragiannakis
Copy link
Contributor

@nikoskaragiannakis nikoskaragiannakis commented Mar 18, 2018

TST: Added test for construction Series with dtype=str
BUG: Handles case where data is scalar
DOC: added changes to whatsnew/v0.23.0.txt

Checklist for other PRs (remove this part if you are doing a PR for the pandas documentation sprint):

@codecov
Copy link

codecov bot commented Mar 18, 2018

Codecov Report

Merging #20401 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20401      +/-   ##
==========================================
+ Coverage   91.77%   91.77%   +<.01%     
==========================================
  Files         152      152              
  Lines       49203    49214      +11     
==========================================
+ Hits        45155    45168      +13     
+ Misses       4048     4046       -2
Flag Coverage Δ
#multiple 90.16% <100%> (ø) ⬆️
#single 41.84% <100%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/series.py 93.85% <100%> (ø) ⬆️
pandas/util/testing.py 84.11% <0%> (+0.37%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bd8a3cf...41d35ca. Read the comment docs.

@nikoskaragiannakis nikoskaragiannakis changed the title Fix series construction str Fix Series construction with dtype=str Mar 18, 2018
@@ -110,6 +110,11 @@ def test_constructor_empty(self, input_class):
empty2 = Series(input_class(), index=lrange(10), dtype='float64')
assert_series_equal(empty, empty2)

# GH 19853 : with empty string, index and dtype str
empty = Series('', dtype='str', index=range(3))
assert empty.all() == ''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

construct the resultant series and use assert_series_equal

@@ -714,6 +714,7 @@ Other API Changes
- ``pd.to_datetime('today')`` now returns a datetime, consistent with ``pd.Timestamp('today')``; previously ``pd.to_datetime('today')`` returned a ``.normalized()`` datetime (:issue:`19935`)
- :func:`Series.str.replace` now takes an optional `regex` keyword which, when set to ``False``, uses literal string replacement rather than regex replacement (:issue:`16808`)
- :func:`DatetimeIndex.strftime` and :func:`PeriodIndex.strftime` now return an ``Index`` instead of a numpy array to be consistent with similar accessors (:issue:`20127`)
``Series`` construction with a ``string``, ``dtype=str`` specified, and ``index`` specified will now return an ``object`` dtyped ``Series``, previously this would raise an AttributeError (:issue:`19853`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to Bug / reshaping.

@@ -714,6 +714,7 @@ Other API Changes
- ``pd.to_datetime('today')`` now returns a datetime, consistent with ``pd.Timestamp('today')``; previously ``pd.to_datetime('today')`` returned a ``.normalized()`` datetime (:issue:`19935`)
- :func:`Series.str.replace` now takes an optional `regex` keyword which, when set to ``False``, uses literal string replacement rather than regex replacement (:issue:`16808`)
- :func:`DatetimeIndex.strftime` and :func:`PeriodIndex.strftime` now return an ``Index`` instead of a numpy array to be consistent with similar accessors (:issue:`20127`)
``Series`` construction with a ``string``, ``dtype=str`` specified, and ``index`` specified will now return an ``object`` dtyped ``Series``, previously this would raise an AttributeError (:issue:`19853`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

referencde :class:`Series`

you can simplify this, just say construction with dtype=str previously raised in some cases.

@@ -4059,9 +4059,14 @@ def _try_cast(arr, take_fast_path):
if issubclass(subarr.dtype.type, compat.string_types):
# GH 16605
# If not empty convert the data to dtype
if not isna(data).all():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need all this, just change to use np.any

In [3]: np.any(pd.isna(''))
Out[3]: False

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Dtype Conversions Unexpected or buggy dtype conversions Bug labels Mar 19, 2018

subarr = np.array(data, dtype=object, copy=copy)
# GH 19853: If data is a scalar, subarr has already the result
if not np.isscalar(data):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right but is this still an extra call here? do we need the scalar check? (and should be is_scalar anyhow if its needed)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a check indeed. That was the problem from the beginning.
If it is scalar, subarr has already the correct result.
I'll change it to is_scalar

@jreback jreback added this to the 0.23.0 milestone Mar 19, 2018
@nikoskaragiannakis
Copy link
Contributor Author

@jreback any more comments here?


subarr = np.array(data, dtype=object, copy=copy)
# GH 19853: If data is a scalar, subarr has already the result
if not is_scalar(data):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this additional check actually needed? (is_scalar)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. If it is scalar, no other change is needed, since subarr has the correct result. If it is not scalar, then it does the conversion

data = np.array(data, dtype=dtype, copy=False)

(That conversion was there already)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my point is that do you actually need to add this check? when you take it out (but fix the np.any) is there any problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I take out, then this https://github.com/pandas-dev/pandas/pull/20401/files#diff-3bbe4551f20de6060dce38a95d0adc80R114 will raise

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../core/series.py:270: in __init__
    data = SingleBlockManager(data, index, fastpath=True)
../core/internals.py:4632: in __init__
    block = make_block(block, placement=slice(0, len(axis)), ndim=1)
../core/internals.py:3161: in make_block
    return klass(values, ndim=ndim, placement=placement)
../core/internals.py:2268: in __init__
    placement=placement)
../core/internals.py:117: in __init__
    self.ndim = self._check_ndim(values, ndim)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <[AttributeError("ndim") raised in repr()] ObjectBlock object at 0x7f1ae266da98>
values = array('', dtype=object), ndim = 1

    def _check_ndim(self, values, ndim):
        """ndim inference and validation.
    
            Infers ndim from 'values' if not provided to __init__.
            Validates that values.ndim and ndim are consistent if and only if
            the class variable '_validate_ndim' is True.
    
            Parameters
            ----------
            values : array-like
            ndim : int or None
    
            Returns
            -------
            ndim : int
    
            Raises
            ------
            ValueError : the number of dimensions do not match
            """
        if ndim is None:
            ndim = values.ndim
    
        if self._validate_ndim and values.ndim != ndim:
            msg = ("Wrong number of dimensions. values.ndim != ndim "
                   "[{} != {}]")
>           raise ValueError(msg.format(values.ndim, ndim))
E           ValueError: Wrong number of dimensions. values.ndim != ndim [0 != 1]

../core/internals.py:153: ValueError

@jreback jreback merged commit 77d5ea0 into pandas-dev:master Mar 30, 2018
@jreback
Copy link
Contributor

jreback commented Mar 30, 2018

thanks @nikoskaragiannakis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: invalid constrution of a Series with dtype=str
3 participants