Skip to content

BUG: .shift() on IntervalArray column raises ValueError #26479

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
datapythonista opened this issue May 21, 2019 · 1 comment
Closed

BUG: .shift() on IntervalArray column raises ValueError #26479

datapythonista opened this issue May 21, 2019 · 1 comment
Labels
Bug Duplicate Report Duplicate issue or pull request Interval Interval data type

Comments

@datapythonista
Copy link
Member

Code Sample, a copy-pastable example if possible

>>> import pandas
>>> interval_array = pandas.arrays.IntervalArray.from_arrays([1, 2, 3], [4, 5, 6])
>>> interval_array
IntervalArray([(1, 4], (2, 5], (3, 6]],
              closed='right',
              dtype='interval[int64]')
>>> interval_series = pandas.Series(interval_array)
>>> interval_series
0    (1, 4]
1    (2, 5]
2    (3, 6]
dtype: interval
>>> interval_series.shift(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\msys64\home\UKC3153\src\pandas\pandas\core\series.py", line 3960, in shift
    fill_value=fill_value)
  File "C:\msys64\home\UKC3153\src\pandas\pandas\core\generic.py", line 9091, in shift
    fill_value=fill_value)
  File "C:\msys64\home\UKC3153\src\pandas\pandas\core\internals\managers.py", line 522, in shift
    return self.apply('shift', **kwargs)
  File "C:\msys64\home\UKC3153\src\pandas\pandas\core\internals\managers.py", line 395, in apply
    applied = getattr(b, f)(**kwargs)
  File "C:\msys64\home\UKC3153\src\pandas\pandas\core\internals\blocks.py", line 1828, in shift
    self.values.shift(periods=periods, fill_value=fill_value),
  File "C:\msys64\home\UKC3153\src\pandas\pandas\core\arrays\base.py", line 527, in shift
    dtype=self.dtype
  File "C:\msys64\home\UKC3153\src\pandas\pandas\core\arrays\interval.py", line 214, in _from_sequence
    return cls(scalars, dtype=dtype, copy=copy)
  File "C:\msys64\home\UKC3153\src\pandas\pandas\core\arrays\interval.py", line 159, in __new__
    verify_integrity=verify_integrity)
  File "C:\msys64\home\UKC3153\src\pandas\pandas\core\arrays\interval.py", line 177, in _simple_new
    left = left.astype(dtype.subtype)
  File "C:\msys64\home\UKC3153\src\pandas\pandas\core\indexes\numeric.py", line 331, in astype
    raise ValueError('Cannot convert NA to integer')
ValueError: Cannot convert NA to integer

Problem description

Applying a .shift() on a Series backed by the IntervalArray extension array raises an exception

Expected Output

0       NaN
1    (1, 4]
2    (2, 5]
dtype: interval

Output of pd.show_versions()

INSTALLED VERSIONS

python: 3.7.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.25.0.dev0+592.gb563d452f
pytest: 4.5.0
pip: 19.1.1
setuptools: 41.0.1
Cython: 0.29.7
numpy: 1.16.3
scipy: 1.2.1
pyarrow: 0.11.1
xarray: 0.12.1
IPython: 7.5.0
sphinx: 2.0.1
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: 1.2.1
tables: 3.5.1
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: 2.6.2
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.8
lxml.etree: 4.3.3
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.3.3
pymysql: None
psycopg2: None
jinja2: 2.10.1
s3fs: None
fastparquet: 0.3.0
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@datapythonista datapythonista added Bug Interval Interval data type labels May 21, 2019
@datapythonista datapythonista changed the title BUG: .shift() on IntervalArray column BUG: .shift() on IntervalArray column raises ValueError May 21, 2019
@jschendel
Copy link
Member

Thanks, this is a dupe of #22428

This issue only occurs for dtypes that can't store NA, so a workaround in the meantime is to cast to a float subtype prior to the shift:

In [1]: import pandas

In [2]: interval_array = pandas.arrays.IntervalArray.from_arrays([1, 2, 3], [4, 5, 6])

In [3]: interval_series = pandas.Series(interval_array)

In [4]: interval_series
Out[4]: 
0    (1, 4]
1    (2, 5]
2    (3, 6]
dtype: interval

In [5]: interval_series.astype('interval[float64]').shift(1)
Out[5]: 
0           NaN
1    (1.0, 4.0]
2    (2.0, 5.0]
dtype: interval

Note: IntervalArray doesn't support IntNA subtypes yet, so there isn't a way to maintain integer endpoints when introducing NA.

@jschendel jschendel added the Duplicate Report Duplicate issue or pull request label May 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Duplicate Report Duplicate issue or pull request Interval Interval data type
Projects
None yet
Development

No branches or pull requests

2 participants