DataFrameGroupBy.quantile raises for non-numeric dtypes rather than dropping columns #27892

TomAugspurger · 2019-08-13T02:22:42Z

In pandas 0.24.x, we had

In [1]: import pandas as pd

In [2]: pd.DataFrame({"A": ['a', 'b']}).groupby([0, 0]).quantile()
Out[2]:
Empty DataFrame
Columns: []
Index: []

In 0.25.0, we have

In [3]: pd.DataFrame({"A": ['a', 'b']}).groupby([0, 0]).quantile()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-8152ffc0932b> in <module>
----> 1 pd.DataFrame({"A": ['a', 'b']}).groupby([0, 0]).quantile()

~/sandbox/pandas/pandas/core/groupby/groupby.py in quantile(self, q, interpolation)
   1908             post_processing=post_processor,
   1909             q=q,
-> 1910             interpolation=interpolation,
   1911         )
   1912

~/sandbox/pandas/pandas/core/groupby/groupby.py in _get_cythonized_result(self, how, grouper, aggregate, cython_dtype, needs_values, needs_mask, needs_ngroups, result_is_index, pre_processing, post_processing, **kwargs)
   2236                 vals = obj.values
   2237                 if pre_processing:
-> 2238                     vals, inferences = pre_processing(vals)
   2239                 func = partial(func, vals)
   2240

~/sandbox/pandas/pandas/core/groupby/groupby.py in pre_processor(vals)
   1875             if is_object_dtype(vals):
   1876                 raise TypeError(
-> 1877                     "'quantile' cannot be performed against 'object' dtypes!"
   1878                 )
   1879

TypeError: 'quantile' cannot be performed against 'object' dtypes!

This is most relevant for mixed dataframes

In [6]: df = pd.DataFrame({"A": [0, 1], 'B': ['a', 'b']})

In [7]: df.groupby([0, 1]).quantile(0.5)

...

TypeError: 'quantile' cannot be performed against 'object' dtypes!

TomAugspurger · 2019-08-13T02:24:36Z

Unfortunately, we can't just exclude object dtype. We apparently used to try to do the quantile, and caught any exceptions

# in 0.24.2
In [3]: pd.DataFrame({"A": ['a', 'b']}, dtype=object).groupby([0, 0]).quantile()
Out[3]:
Empty DataFrame
Columns: []
Index: []

I don't know if that behavior is worth preserving.

guilherme-salome · 2019-08-16T18:38:33Z

What is the desired behavior here? Some possibilities:

Drop the columns that have an object data type;
Drop the columns that have an object data type, but warn the user about the use of quantile (something similar to numpy warnings when applying log to a list that contains zero);
Do not drop the column with object data type, but return a column with NaNs;
Interrupt execution.

TomAugspurger · 2019-08-19T17:13:30Z

Ideally, we would match the 0.24.2 behavior. That can roughly be described as "attempt the quantile, but skip any columns that raise an error". But that may not be easily doable with the new quantile implementation.

TomAugspurger · 2019-08-19T17:14:07Z

I'm not sure anyone will get to this before 0.25.1. I'll leave it at that milestone, but we can push if needed.

guilherme-salome · 2019-08-19T17:51:53Z

When will 0.25.1 be released? I will try to understand what is happening with the quantile function.

TomAugspurger · 2019-08-19T18:29:47Z

0.25.1 is targeted for this Wednesday.

…

On Mon, Aug 19, 2019 at 12:52 PM Guilherme Salomé ***@***.***> wrote: When will 0.25.1 be released? I will try to understand what is happening with the quantile function. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#27892?email_source=notifications&email_token=AAKAOIRWUFHC6ZYSINHX5QTQFLMUFA5CNFSM4ILGPO32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4TYWMI#issuecomment-522685233>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKAOIXS7UHPCUD4G4VHPH3QFLMUFANCNFSM4ILGPO3Q> .

TomAugspurger · 2019-11-12T17:20:01Z

@WillAyd do you think this is doable for 1.0? You had a recent refactor for quantile right?

WillAyd · 2019-11-12T17:35:35Z

Haven’t looked at this. I don’t object to pushing unless a community PR picks it up

…

Sent from my iPhone

On Nov 12, 2019, at 9:20 AM, Tom Augspurger ***@***.***> wrote: @WillAyd do you think this is doable for 1.0? You had a recent refactor for quantile right? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

yanxueotft · 2019-12-05T02:20:56Z

looks like 0.25.3 still has the non-numeric issue...

TomAugspurger · 2019-12-30T16:01:43Z

Pushing.

TomAugspurger added this to the 0.25.1 milestone Aug 13, 2019

TomAugspurger added Groupby Regression Functionality that used to work in a prior pandas version labels Aug 13, 2019

TomAugspurger changed the title ~~Quantile raises for object type rather than dropping columns~~ DataFrameGroupBy.quantile raises for object type rather than dropping columns Aug 13, 2019

TomAugspurger changed the title ~~DataFrameGroupBy.quantile raises for object type rather than dropping columns~~ DataFrameGroupBy.quantile raises for non-numeric dtypes rather than dropping columns Aug 13, 2019

TomAugspurger modified the milestones: 0.25.1, 1.0 Aug 20, 2019

jbrockmendel added the quantile quantile method label Oct 22, 2019

TomAugspurger modified the milestones: 1.0, Contributions Welcome Dec 30, 2019

mroeschke added the Bug label Apr 5, 2020

rhshadrach mentioned this issue Jun 13, 2020

BUG: DataFrameGroupBy.quantile raises for non-numeric dtypes rather than dropping columns #34756

Merged

5 tasks

jreback modified the milestones: Contributions Welcome, 1.1 Jun 24, 2020

jreback closed this as completed in #34756 Jul 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DataFrameGroupBy.quantile raises for non-numeric dtypes rather than dropping columns #27892

DataFrameGroupBy.quantile raises for non-numeric dtypes rather than dropping columns #27892

TomAugspurger commented Aug 13, 2019 •

edited

Loading

TomAugspurger commented Aug 13, 2019

Uh oh!

guilherme-salome commented Aug 16, 2019 •

edited

Loading

Uh oh!

TomAugspurger commented Aug 19, 2019

Uh oh!

TomAugspurger commented Aug 19, 2019

Uh oh!

guilherme-salome commented Aug 19, 2019

Uh oh!

TomAugspurger commented Aug 19, 2019 via email

Uh oh!

TomAugspurger commented Nov 12, 2019

Uh oh!

WillAyd commented Nov 12, 2019 via email

Uh oh!

yanxueotft commented Dec 5, 2019

Uh oh!

TomAugspurger commented Dec 30, 2019

Uh oh!

Uh oh!

DataFrameGroupBy.quantile raises for non-numeric dtypes rather than dropping columns #27892

DataFrameGroupBy.quantile raises for non-numeric dtypes rather than dropping columns #27892

Comments

TomAugspurger commented Aug 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TomAugspurger commented Aug 13, 2019

Uh oh!

guilherme-salome commented Aug 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomAugspurger commented Aug 19, 2019

Uh oh!

TomAugspurger commented Aug 19, 2019

Uh oh!

guilherme-salome commented Aug 19, 2019

Uh oh!

TomAugspurger commented Aug 19, 2019 via email

Uh oh!

TomAugspurger commented Nov 12, 2019

Uh oh!

WillAyd commented Nov 12, 2019 via email

Uh oh!

yanxueotft commented Dec 5, 2019

Uh oh!

TomAugspurger commented Dec 30, 2019

Uh oh!

TomAugspurger commented Aug 13, 2019 •

edited

Loading

guilherme-salome commented Aug 16, 2019 •

edited

Loading