Skip to content

DataFrameGroupBy.quantile raises for non-numeric dtypes rather than dropping columns #27892

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Aug 13, 2019 · 10 comments · Fixed by #34756
Closed
Labels
Bug Groupby quantile quantile method Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Aug 13, 2019

In pandas 0.24.x, we had

In [1]: import pandas as pd

In [2]: pd.DataFrame({"A": ['a', 'b']}).groupby([0, 0]).quantile()
Out[2]:
Empty DataFrame
Columns: []
Index: []

In 0.25.0, we have

In [3]: pd.DataFrame({"A": ['a', 'b']}).groupby([0, 0]).quantile()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-8152ffc0932b> in <module>
----> 1 pd.DataFrame({"A": ['a', 'b']}).groupby([0, 0]).quantile()

~/sandbox/pandas/pandas/core/groupby/groupby.py in quantile(self, q, interpolation)
   1908             post_processing=post_processor,
   1909             q=q,
-> 1910             interpolation=interpolation,
   1911         )
   1912

~/sandbox/pandas/pandas/core/groupby/groupby.py in _get_cythonized_result(self, how, grouper, aggregate, cython_dtype, needs_values, needs_mask, needs_ngroups, result_is_index, pre_processing, post_processing, **kwargs)
   2236                 vals = obj.values
   2237                 if pre_processing:
-> 2238                     vals, inferences = pre_processing(vals)
   2239                 func = partial(func, vals)
   2240

~/sandbox/pandas/pandas/core/groupby/groupby.py in pre_processor(vals)
   1875             if is_object_dtype(vals):
   1876                 raise TypeError(
-> 1877                     "'quantile' cannot be performed against 'object' dtypes!"
   1878                 )
   1879

TypeError: 'quantile' cannot be performed against 'object' dtypes!

This is most relevant for mixed dataframes

In [6]: df = pd.DataFrame({"A": [0, 1], 'B': ['a', 'b']})

In [7]: df.groupby([0, 1]).quantile(0.5)

...

TypeError: 'quantile' cannot be performed against 'object' dtypes!
@TomAugspurger TomAugspurger added this to the 0.25.1 milestone Aug 13, 2019
@TomAugspurger TomAugspurger added Groupby Regression Functionality that used to work in a prior pandas version labels Aug 13, 2019
@TomAugspurger TomAugspurger changed the title Quantile raises for object type rather than dropping columns DataFrameGroupBy.quantile raises for object type rather than dropping columns Aug 13, 2019
@TomAugspurger TomAugspurger changed the title DataFrameGroupBy.quantile raises for object type rather than dropping columns DataFrameGroupBy.quantile raises for non-numeric dtypes rather than dropping columns Aug 13, 2019
@TomAugspurger
Copy link
Contributor Author

Unfortunately, we can't just exclude object dtype. We apparently used to try to do the quantile, and caught any exceptions

# in 0.24.2
In [3]: pd.DataFrame({"A": ['a', 'b']}, dtype=object).groupby([0, 0]).quantile()
Out[3]:
Empty DataFrame
Columns: []
Index: []

I don't know if that behavior is worth preserving.

@guilherme-salome
Copy link
Contributor

guilherme-salome commented Aug 16, 2019

What is the desired behavior here? Some possibilities:

  • Drop the columns that have an object data type;
  • Drop the columns that have an object data type, but warn the user about the use of quantile (something similar to numpy warnings when applying log to a list that contains zero);
  • Do not drop the column with object data type, but return a column with NaNs;
  • Interrupt execution.

@TomAugspurger
Copy link
Contributor Author

Ideally, we would match the 0.24.2 behavior. That can roughly be described as "attempt the quantile, but skip any columns that raise an error". But that may not be easily doable with the new quantile implementation.

@TomAugspurger
Copy link
Contributor Author

I'm not sure anyone will get to this before 0.25.1. I'll leave it at that milestone, but we can push if needed.

@guilherme-salome
Copy link
Contributor

When will 0.25.1 be released? I will try to understand what is happening with the quantile function.

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Aug 19, 2019 via email

@TomAugspurger TomAugspurger modified the milestones: 0.25.1, 1.0 Aug 20, 2019
@jbrockmendel jbrockmendel added the quantile quantile method label Oct 22, 2019
@TomAugspurger
Copy link
Contributor Author

@WillAyd do you think this is doable for 1.0? You had a recent refactor for quantile right?

@WillAyd
Copy link
Member

WillAyd commented Nov 12, 2019 via email

@yanxueotft
Copy link

looks like 0.25.3 still has the non-numeric issue...

@TomAugspurger
Copy link
Contributor Author

Pushing.

@TomAugspurger TomAugspurger modified the milestones: 1.0, Contributions Welcome Dec 30, 2019
@mroeschke mroeschke added the Bug label Apr 5, 2020
@jreback jreback modified the milestones: Contributions Welcome, 1.1 Jun 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby quantile quantile method Regression Functionality that used to work in a prior pandas version
Projects
None yet
7 participants