DOC: Fixed example & description for pandas.cut #20069

ikoevska · 2018-03-09T06:06:34Z

closes #xxxx
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

################################################################################
############################ Docstring (pandas.cut) ############################
################################################################################

Return indices of half-open `bins` to which each value of `x` belongs.

Use `cut` when you need to segment and sort data values into bins or
buckets of data. This function is also useful for going from a continuous
variable to a categorical variable. For example, `cut` could convert ages
to groups of age ranges.

Parameters
----------
x : array-like
    Input array to be binned. It has to be 1-dimensional.
bins : int, sequence of scalars, or pandas.IntervalIndex
    If `bins` is an int, defines the number of equal-width bins in the
    range of `x`. The range of `x` is extended by .1% on each side to
    include the min or max values of `x`.
    If `bins` is a sequence, defines the bin edges allowing for
    non-uniform bin width. No extension of the range of `x` is done.
right : bool, optional, default 'True'
    Indicates whether the `bins` include the rightmost edge or not. If
    `right == True` (the default), then the `bins` [1,2,3,4] indicate
    (1,2], (2,3], (3,4].
labels : array or bool, optional
    Used as labels for the resulting `bins`. Must be of the same length as
    the resulting `bins`. If False, returns only integer indicators of the
    `bins`.
retbins : bool, optional, default 'False'
    Whether to return the `bins` or not. Useful when `bins` is provided
    as a scalar.
precision : int, optional, default '3'
    The precision at which to store and display the `bins` labels.
include_lowest : bool, optional, default 'False'
    Whether the first interval should be left-inclusive or not.

Returns
-------
out : pandas.Categorical or Series, or array of int if `labels` is 'False'
    The return type depends on the input.
    If the input is a Series, a Series of type category is returned.
    Else - pandas.Categorical is returned. `Bins` are represented as
    categories when categorical data is returned.
bins : numpy.ndarray of floats
    Returned only if `retbins` is 'True'.

See Also
--------
qcut : Discretize variable into equal-sized buckets based on rank
    or based on sample quantiles.
pandas.Categorical : Represents a categorical variable in
    classic R / S-plus fashion.
Series : One-dimensional ndarray with axis labels (including time series).
pandas.IntervalIndex : Immutable Index implementing an ordered,
    sliceable set. IntervalIndex represents an Index of intervals that
    are all closed on the same side.

Notes
-----
Any NA values will be NA in the result. Out of bounds values will be NA in
the resulting pandas.Categorical object.

Examples
--------
>>> pd.cut(np.array([.2, 1.4, 2.5, 6.2, 9.7, 2.1]), 3, retbins=True)
... # doctest: +ELLIPSIS
([(0.19, 3.367], (0.19, 3.367], (0.19, 3.367], (3.367, 6.533], ...
Categories (3, interval[float64]): [(0.19, 3.367] < (3.367, 6.533] ...

>>> pd.cut(np.array([.2, 1.4, 2.5, 6.2, 9.7, 2.1]),
...        3, labels=["good", "medium", "bad"])
... # doctest: +SKIP
[good, good, good, medium, bad, good]
Categories (3, object): [good < medium < bad]

>>> pd.cut(np.ones(5), 4, labels=False)
array([1, 1, 1, 1, 1], dtype=int64)

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.cut" correct. :)

…ch-1

jorisvandenbossche

Thanks for the PR! Already added a few comments, will take a closer look later.

For the examples, can you start it with a default example? So one not using retbins=True. And then you can after that explicitly say the difference if you do retbins=True. I would also show an example with Series to illustrate the return type explanation.

jorisvandenbossche · 2018-03-09T07:53:46Z

pandas/core/reshape/tile.py

+    Use `cut` when you need to segment and sort data values into bins or
+    buckets of data. This function is also useful for going from a continuous
+    variable to a categorical variable. For example, `cut` could convert ages
+    to groups of age ranges.


Nice explanation!

jorisvandenbossche · 2018-03-09T08:00:20Z

pandas/core/reshape/tile.py

@@ -24,53 +24,64 @@
 def cut(x, bins, right=True, labels=None, retbins=False, precision=3,
        include_lowest=False):
    """
-    Return indices of half-open bins to which each value of `x` belongs.
+    Return indices of half-open `bins` to which each value of `x` belongs.


I know it was already there, but I wondering if we can make this first sentence better. Because I have to say I have to read it very carefully to actually understand it :)

Some ideas:

"Return indices" is actually not correct? As it just returns the intervals or bins itself?

Looking at https://en.wikipedia.org/wiki/Discretization_of_continuous_features and https://en.wikipedia.org/wiki/Data_binning so maybe something with the terms "convert continuous values in discrete bins" ?

jorisvandenbossche · 2018-03-09T08:01:17Z

pandas/core/reshape/tile.py

+        include the min or max values of `x`.
+        If `bins` is a sequence, defines the bin edges allowing for
+        non-uniform bin width. No extension of the range of `x` is done.
+    right : bool, optional, default 'True'


you can leave out the "optional" here (it's indeed optional to specify it, but it has a default value, so is not purely optional)

(same for the ones below where you have both 'optional' and 'default ..'

codecov · 2018-03-09T11:52:07Z

Codecov Report

Merging #20069 into master will increase coverage by 0.02%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #20069      +/-   ##
==========================================
+ Coverage   91.69%   91.72%   +0.02%     
==========================================
  Files         150      150              
  Lines       49112    49112              
==========================================
+ Hits        45035    45047      +12     
+ Misses       4077     4065      -12

Flag	Coverage Δ
#multiple	`90.1% <ø> (+0.02%)`	⬆️
#single	`41.86% <ø> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/reshape/tile.py	`92.94% <ø> (ø)`	⬆️
pandas/plotting/_converter.py	`66.81% <0%> (+1.73%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1d73cf3...db337c1. Read the comment docs.

…s.rst (pandas-dev#20080) * Add syntax highlighting to SAS code blocks * Fix typo

…das-dev#20059)

…0036)

…ch-1

pep8speaks · 2018-03-10T09:46:51Z

Hello @ikoevska! Thanks for updating the PR.

In the file pandas/core/indexes/base.py, following are the PEP8 issues :

Line 1159:15: W291 trailing whitespace

ikoevska · 2018-03-10T09:53:07Z

I am closing this one as it got totally messed up after a rebase. Will open a new one with @jorisvandenbossche comments applied.

jorisvandenbossche · 2018-03-10T10:14:49Z

@ikoevska For future reference, please update this PR. Even if you make a new branch locally, you can force push to the same branch on your fork, and this PR gets updated. But no problem for this time!

ikoevska added 5 commits March 9, 2018 00:19

Reworked doc string for pandas.cut

8a90d6d

Fixed example and extended descr

690dbc5

DOC: Fixed example & description for pandas.cut

00f35fb

Merge branch 'patch-1' of https://github.com/ikoevska/pandas into pat…

f277f15

…ch-1

DOC: Fixed issues with panda.cut after flake8

54df8d3

jorisvandenbossche added the Docs label Mar 9, 2018

jorisvandenbossche reviewed Mar 9, 2018

View reviewed changes

alysivji and others added 5 commits March 9, 2018 09:19

DOC: Improve docstring for pandas.Index.repeat (pandas-dev#19985)

747501a

Temporary github PR template for sprint (pandas-dev#20055)

9119d07

DOC: Update Kurt Docstr (pandas-dev#20044)

c730d08

BUG: Retain timezone dtype with cut and qcut (pandas-dev#19890)

cc1b934

Fix typo in apply.py (pandas-dev#20058)

731d971

kylebarron and others added 10 commits March 9, 2018 10:31

DOC: Add syntax highlighting to SAS code blocks in comparison_with_sa…

7c14e4f

…s.rst (pandas-dev#20080) * Add syntax highlighting to SAS code blocks * Fix typo

TST: series/indexing tests parametrization + moving test methods (pan…

ed96567

…das-dev#20059)

Added 'displayed_only' option to 'read_html' (pandas-dev#20047)

bd31f71

Refactored GroupBy ASVs (pandas-dev#20043)

da6f827

Cythonized GroupBy pct_change (pandas-dev#19919)

52cffa3

DOC: Extend docstring pandas core index to_frame method (pandas-dev#2…

4131149

…0036)

Reworked doc string for pandas.cut

2e6b4b1

Fixed example and extended descr

1d392e4

DOC: Fixed issues with panda.cut after flake8

2387be9

Merge branch 'patch-1' of https://github.com/ikoevska/pandas into pat…

db337c1

…ch-1

ikoevska closed this Mar 10, 2018

ikoevska mentioned this pull request Mar 10, 2018

DOC: Update pandas.cut docstring #20104

Merged

ikoevska deleted the patch-1 branch June 9, 2018 11:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DOC: Fixed example & description for pandas.cut #20069

DOC: Fixed example & description for pandas.cut #20069

Uh oh!

ikoevska commented Mar 9, 2018 •

edited

Loading

Uh oh!

jorisvandenbossche left a comment

Uh oh!

jorisvandenbossche Mar 9, 2018

Uh oh!

jorisvandenbossche Mar 9, 2018

Uh oh!

jorisvandenbossche Mar 9, 2018

Uh oh!

codecov bot commented Mar 9, 2018 •

edited

Loading

Uh oh!

pep8speaks commented Mar 10, 2018

Uh oh!

ikoevska commented Mar 10, 2018

Uh oh!

jorisvandenbossche commented Mar 10, 2018

Uh oh!

Uh oh!

Uh oh!

DOC: Fixed example & description for pandas.cut #20069

DOC: Fixed example & description for pandas.cut #20069

Uh oh!

Conversation

ikoevska commented Mar 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Mar 9, 2018

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Mar 9, 2018

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Mar 9, 2018

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

pep8speaks commented Mar 10, 2018

Uh oh!

ikoevska commented Mar 10, 2018

Uh oh!

jorisvandenbossche commented Mar 10, 2018

Uh oh!

Uh oh!

ikoevska commented Mar 9, 2018 •

edited

Loading

codecov bot commented Mar 9, 2018 •

edited

Loading