Skip to content

BUG: Constructing a Series from a scalar generally doesn't work for extension types #28401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jschendel opened this issue Sep 12, 2019 · 3 comments · Fixed by #37989
Closed
Labels
Bug Constructors Series/DataFrame/Index/pd.array Constructors Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays.
Milestone

Comments

@jschendel
Copy link
Member

Code Sample, a copy-pastable example if possible

Construction without specifying a dtype results in object dtype for Interval and Period:

In [2]: pd.Series(pd.Interval(0, 1), index=range(3))
Out[2]: 
0    (0, 1]
1    (0, 1]
2    (0, 1]
dtype: object

In [3]: pd.Series(pd.Period("2019Q1", freq="Q"), index=range(3))
Out[3]: 
0    2019Q1
1    2019Q1
2    2019Q1
dtype: object

This looks okay for a tz-aware Timestamp:

In [4]: pd.Series(pd.Timestamp("2019", tz="US/Eastern"), index=range(3))
Out[4]: 
0   2019-01-01 00:00:00-05:00
1   2019-01-01 00:00:00-05:00
2   2019-01-01 00:00:00-05:00
dtype: datetime64[ns, US/Eastern]

Specifying a dtype raises for Interval, Period and tz-aware Timestamp:

In [5]: pd.Series(pd.Interval(0, 1), index=range(3), dtype=pd.IntervalDtype("int64"))
---------------------------------------------------------------------------
TypeError: IntervalArray(...) must be called with a collection of some kind, (0, 1] was passed

In [6]: pd.Series(pd.Period("2019Q1", freq="Q"), index=range(3), dtype=pd.PeriodDtype("Q"))
---------------------------------------------------------------------------
ValueError: Buffer has wrong number of dimensions (expected 1, got 0)

In [7]: pd.Series(pd.Timestamp("2019", tz="US/Eastern"), index=range(3), dtype=pd.DatetimeTZDtype(tz="US/Eastern"))
---------------------------------------------------------------------------
TypeError: 'Timestamp' object is not iterable

Both of the above patterns appear to be working fine when using scalars that correspond to non-extensions dtypes (e.g. numeric, tz-naive, timedelta).

Problem description

The Series constructor is not correctly inferring the dtype from scalar Interval/Period objects when a dtype isn't specified, and raises for Interval/Period/tz-aware Timestamp when a dtype is specified.

Expected Output

I'd expect the Series to be constructed with the proper dtype.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 261c3a6
python : 3.7.3.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.14-041914-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.25.0+332.g261c3a667
numpy : 1.16.4
pytz : 2019.1
dateutil : 2.8.0
pip : 19.1.1
setuptools : 41.0.1
Cython : 0.29.10
pytest : 4.6.2
hypothesis : 4.23.6
sphinx : 1.8.5
blosc : None
feather : None
xlsxwriter : 1.1.8
lxml.etree : 4.3.3
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.5.0
pandas_datareader: None
bs4 : 4.7.1
bottleneck : 1.2.1
fastparquet : 0.3.0
gcsfs : None
lxml.etree : 4.3.3
matplotlib : 3.1.0
numexpr : 2.6.9
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : 0.11.1
pytables : None
s3fs : 0.2.1
scipy : 1.2.1
sqlalchemy : 1.3.4
tables : 3.5.2
xarray : 0.12.1
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.8

@jschendel jschendel added Bug Dtype Conversions Unexpected or buggy dtype conversions Period Period data type Interval Interval data type Constructors Series/DataFrame/Index/pd.array Constructors labels Sep 12, 2019
@jschendel jschendel added this to the Contributions Welcome milestone Sep 12, 2019
@jschendel jschendel removed Interval Interval data type Period Period data type labels Sep 12, 2019
@jschendel jschendel changed the title BUG: Constructing a Series from scalars generally doesn't work extension dtypes BUG: Constructing a Series from a scalar generally doesn't work extension dtypes Sep 12, 2019
@jschendel jschendel changed the title BUG: Constructing a Series from a scalar generally doesn't work extension dtypes BUG: Constructing a Series from a scalar generally doesn't work for extension types Sep 12, 2019
@jorisvandenbossche jorisvandenbossche added the ExtensionArray Extending pandas with custom dtypes or arrays. label Sep 13, 2019
@jorisvandenbossche
Copy link
Member

infer_dtype works fine on lists of intervals/periods:

In [11]: pd.api.types.infer_dtype([pd.Interval(0,1)], skipna=True)
Out[11]: 'interval'

so we could use that on object dtype data? (although that can also be costly)
In the Series construction code, we currently have special code for "maybe casting to datetime" (eg sanitize_array -> maybe_cast_to_datetime)

@KangMingHsi
Copy link

KangMingHsi commented Jul 30, 2020

For this issue, I found that line 469 in construction.py
dtype, value = infer_dtype_from_scalar(value)
should change to dtype, value = infer_dtype_from_scalar(value, pandas_type=True)
After doing this revision, dtype bug would be fixed

> pd.Series(pd.Interval(0, 1), index=range(3))
0    (0, 1]
1    (0, 1]
2    (0, 1]
dtype: interval

And dtype of pd.Series(pd.Period("2019Q1", freq="Q"), index=range(3)) would also be correct,
but here comes another bug in line 711 cast.py
val = val.ordinal
It makes return value of Period to be int, and then

Traceback (most recent call last):
  File "pandas/_libs/tslibs/period.pyx", line 1399, in pandas._libs.tslibs.period.extract_ordinals
    ordinals[i] = p.ordinal
AttributeError: 'int' object has no attribute 'ordinal'

Why method infer_dtype_from_scalar change return val of Period to Period.ordinal?

@arw2019
Copy link
Member

arw2019 commented Nov 21, 2020

This works on 1.2 master:

In [2]:  pd.Series(pd.Interval(0, 1), index=range(3))
Out[2]: 
0    (0, 1]
1    (0, 1]
2    (0, 1]
dtype: interval

In [3]: pd.Series(pd.Period("2019Q1", freq="Q"), index=range(3))
Out[3]: 
0    2019Q1
1    2019Q1
2    2019Q1
dtype: period[Q-DEC]

@jreback jreback modified the milestones: Contributions Welcome, 1.2 Nov 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Constructors Series/DataFrame/Index/pd.array Constructors Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants