-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Potential regression in master re empty Extension Indexes #23933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Milestone
Comments
Root cause is that we don't factorize empty period arrays correctly In [2]: arr = pd.PeriodIndex([], freq='D')
In [3]: pd.factorize(arr)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-65d36072b155> in <module>()
----> 1 pd.factorize(arr)
~/sandbox/pandas/pandas/util/_decorators.py in wrapper(*args, **kwargs)
175 else:
176 kwargs[new_arg_name] = new_arg_value
--> 177 return func(*args, **kwargs)
178 return wrapper
179 return _deprecate_kwarg
~/sandbox/pandas/pandas/core/algorithms.py in factorize(values, sort, order, na_sentinel, size_hint)
630 assume_unique=True)
631
--> 632 uniques = _reconstruct_data(uniques, dtype, original)
633
634 # return original tenor
~/sandbox/pandas/pandas/core/algorithms.py in _reconstruct_data(values, dtype, original)
146 from pandas import Index
147 if is_extension_array_dtype(dtype):
--> 148 values = dtype.construct_array_type()._from_sequence(values)
149 elif is_datetime64tz_dtype(dtype) or is_period_dtype(dtype):
150 values = Index(original)._shallow_copy(values, name=None)
~/sandbox/pandas/pandas/core/arrays/period.py in _from_sequence(cls, scalars, dtype, copy)
200 periods = periods.copy()
201
--> 202 freq = freq or libperiod.extract_freq(periods)
203 ordinals = libperiod.extract_ordinals(periods, freq)
204 return cls(ordinals, freq=freq)
~/sandbox/pandas/pandas/_libs/tslibs/period.pyx in pandas._libs.tslibs.period.extract_freq()
1484 pass
1485
-> 1486 raise ValueError('freq not specified and cannot be inferred')
1487
1488
ValueError: freq not specified and cannot be inferred
|
PeriodArray._from_sequence should probably check It'd be good to add a base extension test for factorizing an empty array. |
TomAugspurger
added a commit
to TomAugspurger/pandas
that referenced
this issue
Jan 3, 2019
TomAugspurger
added a commit
to TomAugspurger/pandas
that referenced
this issue
Jan 3, 2019
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Setting an additional index on a DataFrame with an empty PeriodIndex raises a couple of exceptions:
Expected Output
Problem description
This works fine on 0.23.4:
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: b7294dd
python: 3.7.1.final.0
python-bits: 64
OS: Darwin
OS-release: 18.0.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.19.0+4049.gb7294dd3e
pytest: 3.9.2
pip: 18.1
setuptools: 40.6.2
Cython: 0.28.5
numpy: 1.15.2
scipy: None
pyarrow: None
xarray: 0.10.9
IPython: 7.1.1
sphinx: None
patsy: 0.5.0
dateutil: 2.7.5
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: 2.6.8
feather: None
matplotlib: 3.0.0
openpyxl: None
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.12
pymysql: None
psycopg2: None
jinja2: 2.8.1
s3fs: None
fastparquet: None
pandas_gbq: 0.6.1+2.gd98c621
pandas_datareader: None
gcsfs: None
The text was updated successfully, but these errors were encountered: