Skip to content

Fix Series construction with dtype=str #20401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1035,6 +1035,7 @@ Reshaping
- Bug in :class:`Series` constructor with ``Categorical`` where a ```ValueError`` is not raised when an index of different length is given (:issue:`19342`)
- Bug in :meth:`DataFrame.astype` where column metadata is lost when converting to categorical or a dictionary of dtypes (:issue:`19920`)
- Bug in :func:`cut` and :func:`qcut` where timezone information was dropped (:issue:`19872`)
- Bug in :class:`Series` constructor with a ``dtype=str``, previously raised in some cases (:issue:`19853`)

Other
^^^^^
Expand Down
9 changes: 5 additions & 4 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -4059,9 +4059,10 @@ def _try_cast(arr, take_fast_path):
if issubclass(subarr.dtype.type, compat.string_types):
# GH 16605
# If not empty convert the data to dtype
if not isna(data).all():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need all this, just change to use np.any

In [3]: np.any(pd.isna(''))
Out[3]: False

data = np.array(data, dtype=dtype, copy=False)

subarr = np.array(data, dtype=object, copy=copy)
# GH 19853: If data is a scalar, subarr has already the result
if not is_scalar(data):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this additional check actually needed? (is_scalar)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. If it is scalar, no other change is needed, since subarr has the correct result. If it is not scalar, then it does the conversion

data = np.array(data, dtype=dtype, copy=False)

(That conversion was there already)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my point is that do you actually need to add this check? when you take it out (but fix the np.any) is there any problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I take out, then this https://github.com/pandas-dev/pandas/pull/20401/files#diff-3bbe4551f20de6060dce38a95d0adc80R114 will raise

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../core/series.py:270: in __init__
    data = SingleBlockManager(data, index, fastpath=True)
../core/internals.py:4632: in __init__
    block = make_block(block, placement=slice(0, len(axis)), ndim=1)
../core/internals.py:3161: in make_block
    return klass(values, ndim=ndim, placement=placement)
../core/internals.py:2268: in __init__
    placement=placement)
../core/internals.py:117: in __init__
    self.ndim = self._check_ndim(values, ndim)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <[AttributeError("ndim") raised in repr()] ObjectBlock object at 0x7f1ae266da98>
values = array('', dtype=object), ndim = 1

    def _check_ndim(self, values, ndim):
        """ndim inference and validation.
    
            Infers ndim from 'values' if not provided to __init__.
            Validates that values.ndim and ndim are consistent if and only if
            the class variable '_validate_ndim' is True.
    
            Parameters
            ----------
            values : array-like
            ndim : int or None
    
            Returns
            -------
            ndim : int
    
            Raises
            ------
            ValueError : the number of dimensions do not match
            """
        if ndim is None:
            ndim = values.ndim
    
        if self._validate_ndim and values.ndim != ndim:
            msg = ("Wrong number of dimensions. values.ndim != ndim "
                   "[{} != {}]")
>           raise ValueError(msg.format(values.ndim, ndim))
E           ValueError: Wrong number of dimensions. values.ndim != ndim [0 != 1]

../core/internals.py:153: ValueError

if not np.all(isna(data)):
data = np.array(data, dtype=dtype, copy=False)
subarr = np.array(data, dtype=object, copy=copy)

return subarr
5 changes: 5 additions & 0 deletions pandas/tests/series/test_constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,11 @@ def test_constructor_empty(self, input_class):
empty2 = Series(input_class(), index=lrange(10), dtype='float64')
assert_series_equal(empty, empty2)

# GH 19853 : with empty string, index and dtype str
empty = Series('', dtype=str, index=range(3))
empty2 = Series('', index=range(3))
assert_series_equal(empty, empty2)

@pytest.mark.parametrize('input_arg', [np.nan, float('nan')])
def test_constructor_nan(self, input_arg):
empty = Series(dtype='float64', index=lrange(10))
Expand Down