Skip to content

Conversation

debnathshoham
Copy link
Member

@debnathshoham debnathshoham commented Aug 8, 2021

@jreback jreback added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff ExtensionArray Extending pandas with custom dtypes or arrays. labels Aug 8, 2021
@simonjayhawkins simonjayhawkins added this to the 1.3.2 milestone Aug 9, 2021
@simonjayhawkins
Copy link
Member

needs a release note. fixing a regression so target 1.3.2

@jreback jreback merged commit 14cf6e2 into pandas-dev:master Aug 10, 2021
@jreback
Copy link
Contributor

jreback commented Aug 10, 2021

thanks @debnathshoham

@jreback
Copy link
Contributor

jreback commented Aug 10, 2021

@meeseeksdev backport 1.3.x

@lumberbot-app
Copy link

lumberbot-app bot commented Aug 10, 2021

Something went wrong ... Please have a look at my logs.

@debnathshoham debnathshoham deleted the gh42626 branch August 10, 2021 20:07
jreback pushed a commit that referenced this pull request Aug 10, 2021
…ies when results are floats (#42974)

Co-authored-by: Shoham Debnath <[email protected]>
@simonjayhawkins
Copy link
Member

In summary and for the record this changed behavior from 1.2.5.

1.2.5 always returned an object array of floats when q is list-like. If q is a scalar the return type was always a Python float (as documented https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.Series.quantile.html)

>>> pd.__version__
'1.2.5'
>>> 
>>> result = pd.Series([1, 2, 3, pd.NA], dtype="Int64").quantile([0, 0.75])
>>> result
0.00    1.0
0.75    2.5
dtype: object
>>> 
>>> type(result[0])
<class 'float'>
>>> 
>>> result = pd.Series([1, 2, 3, pd.NA], dtype="Int64").quantile([0])
>>> result
0.0    1.0
dtype: object
>>> 
>>> type(result[0])
<class 'float'>
>>> 
>>> result = pd.Series([1, 2, 3, pd.NA], dtype="Int64").quantile([0.75])
>>> result
0.75    2.5
dtype: object
>>> 
>>> type(result[0.75])
<class 'float'>
>>> 
>>> result = pd.Series([1, 2, 3, pd.NA], dtype="Int64").quantile(0.75)
>>> result
2.5
>>> 
>>> type(result)
<class 'float'>
>>> 
>>> result = pd.Series([1, 2, 3, pd.NA], dtype="Int64").quantile(0)
>>> result
1.0
>>> 
>>> type(result)
<class 'float'>

1.3.2 will now return nullable integer (Int64) or numpy float (float64) depending on the values in the result when q is list-like. If q is a scalar the return type is now a numpy float or a numpy int. (inconsistent with documentation)

>>> pd.__version__
'1.4.0.dev0+415.g99cf794ae2'
>>> 
>>> result = pd.Series([1, 2, 3, pd.NA], dtype="Int64").quantile([0, 0.75])
>>> result
0.00    1.0
0.75    2.5
dtype: float64
>>> 
>>> type(result[0])
<class 'numpy.float64'>
>>> 
>>> result = pd.Series([1, 2, 3, pd.NA], dtype="Int64").quantile([0])
>>> result
pd.Series([1, 2, 3, pd.NA], dtype="Int64").quantil0.0    1
dtype: Int64
>>> 
>>> type(result[0])
<class 'numpy.int64'>
>>> 
>>> result = pd.Series([1, 2, 3, pd.NA], dtype="Int64").quantile([0.75])
>>> result
0.75    2.5
dtype: float64
>>> 
>>> type(result[0.75])
Series([1, 2, 3, pd.NA], dtype="Int64").quantile(0<class 'numpy.float64'>
>>> 
>>> result = pd.Series([1, 2, 3, pd.NA], dtype="Int64").quantile(0.75)
>>> result
2.5
>>> 
>>> type(result)
<class 'numpy.float64'>
>>> 
>>> result = pd.Series([1, 2, 3, pd.NA], dtype="Int64").quantile(0)
>>> result
1
>>> 
>>> type(result)
<class 'numpy.int64'>

feefladder pushed a commit to feefladder/pandas that referenced this pull request Sep 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Cannot calculate quantiles from Int64Dtype Series when results are floats
4 participants