-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Quantile function fails when performing groupby on Time Zone Aware Timestamps #33168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Very similar to #6409, this data type is not supported probably for many groupby operations. Thanks for the report! PRs and investigations welcome |
Hello @mroeschke, thank you for tagging this issues. Could you precise in what sense issue #33168 is similar to #6409? I am open to investigate further this issue, but I really lack insight on how pandas works under the hood. Mainly, I can see that this call fails: https://github.com/pandas-dev/pandas/blob/master/pandas/core/groupby/groupby.py#L2267 And the function is defined as follow: https://github.com/pandas-dev/pandas/blob/master/pandas/core/groupby/groupby.py#L2251 https://github.com/pandas-dev/pandas/blob/master/pandas/core/groupby/groupby.py#L2239 But then I am totally lost, where should I see next? It seems to me than |
#33168 and #6409 are similar in the sense that datetime data types are not supported in groupby. A lot of our aggregation functions for cython are located here: https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/groupby.pyx For this issue, and similarly described in #6409, we will need to convert these dates to integers before passing them to cython, and then wrap them back in their original data type |
At first glance, It appears this code sample gave the correct results in 0.25.3
|
This looks okay on master. Could use a test
|
hi @mroeschke, is this issue for adding a test case still open ? if so can I pick this ? |
Go for it! |
take |
… groupby on Time Zone Aware Timestamps
… groupby on Time Zone Aware Timestamps pre check changes
Code Sample, a copy-pastable example if possible
Maybe not a high priority bug, but I have the feeling it can easily fixed. I just have not enough understanding on how it should be fixed. Please find below the MCVE to reproduce it:
Problem description
The traceback of the error is a bit laconic and I have not enough experience in Pandas source code to cover all details of this error:
I have found similar issues on GitHub with the same exception, but I think it is too generic to be the same related problem. Additionally, I may have found a simple corner case issue with TZ aware timestamp.
I had some hard time to reproduce the error when building the MCVE, finally I found out that it is related to the existence of an extra columns holding Time Zone aware timestamps.
Maybe the fix it is just about updating function signature to add TZ aware timestamps.
The problem can be circonvolved using one of the following writing:
Or:
Or:
Which strongly suggests it is the existence of the TZ Aware extra column
timestamp
that makes the functionquantile
fail.Expected Output
Expected output might be no distinction in flow when performing
groupby
operations on dataframe holding TimeZone aware timestamp as it does with TZ naive timestamp.Note: Thank you for building such a great tool,
pandas
is a first class middleware. Your efforts are strongly appreciated. Let me know how I can help, I would be happy to understand how this can be corrected.Output of
pd.show_versions()
pandas : 1.0.3
numpy : 1.18.2
pytz : 2019.3
dateutil : 2.8.1
pip : 9.0.1
setuptools : 46.1.3
Cython : 0.29.14
pytest : 5.3.2
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.1.8
lxml.etree : 4.3.4
html5lib : 0.999999999
pymysql : None
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.7.1
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.3.4
matplotlib : 3.2.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.13.0
pytables : None
pytest : 5.3.2
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.15
tables : None
tabulate : 0.8.3
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.8
numba : None
The text was updated successfully, but these errors were encountered: