-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Pandas groupby datetime and column then apply generates ValueError #21651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Simpler example:
From the traceback, looks like it's an issue in indexing a (Multi?)Index with dates with a Investigation and PRs welcome! Traceback
|
@mroeschke did you mean In [17]: df.groupby([df.let, df.date, df.date]).apply(lambda x: x.iloc[0:]) |
No, the code above runs property to reproduce the error (albeit I used a confusing column name |
@mroeschke I see that now, sorry, it was syntax error from my side only! |
I am also facing this issue. My column contained date string, i parsed it using dateutil parser. I created 2 new columns for date and time. Then I grouped by an id column and date and trying to sort by time. Here is the code:
I get following error - Output of pd.show_versions() - INSTALLED VERSIONScommit: None pandas: 0.23.4 |
I am also getting this error with the following example: BY_MONTH = pd.Grouper(key='date', freq='M', axis=1)
df = pd.DataFrame({
'date': pd.date_range(start='2000-01-01', freq='D', periods=100),
'value': range(100)
})
ts = df.groupby(('value', BY_MONTH))['value'].mean() I can confirm that this was working in 0.22.0, but started failing in 0.23.0 |
@richardbrks your example may be correctly raising. I believe we had a change that made tuples always refer to a label. If you do |
This runs fine on the current master. So I think this can be closed. |
if you would like to see if we have a test for this or can add one would be great |
This seems not to be working in 1.0.5 |
@maxzinkus Yeah, this only started working in 1.1.0. |
I can confirm that the issue is not reproducible with or after version 1.1.0. |
Code Sample, a copy-pastable example if possible
Problem description
I am trying to group my dataframe and then apply a function to each row of the dataframe. SYM_ROOT is a category variable, while TIME_M is a datetime variable.
However, I keep getting the following error:
ValueError: Key 2017-01-03 00:00:00 not in level Index([2017-01-03], dtype='object', name=u'TIME_M')
I am referring to this stackoverflow post.
I think the reason is that pandas somehow doesn't recognize the date object in the column correctly when using the group by statement. It falsely think compares
2017-01-03 00:00:00
withTIME_M
and generates the error. After separating out the date as a single column, the problem fixes itself.But I think it will be beneficial if pandas can recognize the date object correctly in the columns ...
Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here below this line]INSTALLED VERSIONS
commit: None
python: 2.7.14.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-693.21.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_CA.UTF-8
LOCALE: None.None
pandas: 0.22.0
pytest: 3.3.2
pip: 9.0.1
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 5.4.1
sphinx: 1.6.6
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.4.10
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.1
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: