-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DOC: Fixed example & description for pandas.cut #20069
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Already added a few comments, will take a closer look later.
For the examples, can you start it with a default example? So one not using retbins=True
. And then you can after that explicitly say the difference if you do retbins=True
. I would also show an example with Series to illustrate the return type explanation.
Use `cut` when you need to segment and sort data values into bins or | ||
buckets of data. This function is also useful for going from a continuous | ||
variable to a categorical variable. For example, `cut` could convert ages | ||
to groups of age ranges. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice explanation!
@@ -24,53 +24,64 @@ | |||
def cut(x, bins, right=True, labels=None, retbins=False, precision=3, | |||
include_lowest=False): | |||
""" | |||
Return indices of half-open bins to which each value of `x` belongs. | |||
Return indices of half-open `bins` to which each value of `x` belongs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it was already there, but I wondering if we can make this first sentence better. Because I have to say I have to read it very carefully to actually understand it :)
Some ideas:
- "Return indices" is actually not correct? As it just returns the intervals or bins itself?
- Looking at https://en.wikipedia.org/wiki/Discretization_of_continuous_features and https://en.wikipedia.org/wiki/Data_binning so maybe something with the terms "convert continuous values in discrete bins" ?
include the min or max values of `x`. | ||
If `bins` is a sequence, defines the bin edges allowing for | ||
non-uniform bin width. No extension of the range of `x` is done. | ||
right : bool, optional, default 'True' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can leave out the "optional" here (it's indeed optional to specify it, but it has a default value, so is not purely optional)
(same for the ones below where you have both 'optional' and 'default ..'
Codecov Report
@@ Coverage Diff @@
## master #20069 +/- ##
==========================================
+ Coverage 91.69% 91.72% +0.02%
==========================================
Files 150 150
Lines 49112 49112
==========================================
+ Hits 45035 45047 +12
+ Misses 4077 4065 -12
Continue to review full report at Codecov.
|
…s.rst (pandas-dev#20080) * Add syntax highlighting to SAS code blocks * Fix typo
Hello @ikoevska! Thanks for updating the PR.
|
I am closing this one as it got totally messed up after a rebase. Will open a new one with @jorisvandenbossche comments applied. |
@ikoevska For future reference, please update this PR. Even if you make a new branch locally, you can force push to the same branch on your fork, and this PR gets updated. But no problem for this time! |
git diff upstream/master -u -- "*.py" | flake8 --diff