Skip to content

DataFrame.apply documentation needs upgrade #17972

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tdpetrou opened this issue Oct 25, 2017 · 6 comments
Open

DataFrame.apply documentation needs upgrade #17972

tdpetrou opened this issue Oct 25, 2017 · 6 comments
Labels
Apply Apply, Aggregate, Transform, Map Docs

Comments

@tdpetrou
Copy link
Contributor

Problem description

I believe the only place that apply is mentioned in detail in the documentation is in the essential basic functionality.

All the examples do not need apply, and can be done quicker with other methods. In fact, they actively teach how not to use pandas correctly. For instance, idxmax and interploate are both dataframe methods and from the documentation, they appear to be good use-cases for apply, which they are not.

Expected Output

I understand that it makes sense to use basic examples to motivate apply. But there should be some useful examples where apply is necessary.

Most of the examples should be replaced (or shown how to be duplicated with more idiomatic methods).

@jreback
Copy link
Contributor

jreback commented Dec 10, 2017

pls be more specific here.

@jreback jreback added the Needs Info Clarification about behavior needed to assess issue label Dec 10, 2017
@jreback
Copy link
Contributor

jreback commented Dec 10, 2017

further in master .apply has gained some examples (to the doc-string).

@jreback jreback added the Docs label Dec 10, 2017
@tdpetrou
Copy link
Contributor Author

I know you dislike apply like myself. I gave a tutorial on how to avoid apply by vectorizing it in my idiomatic pandas talk at pydata nyc. I used the poor examples from the documentation during the tutorial. This is one of early pandas users largest mistakes - using apply when it's not necessary.

I think it's probably OK to have a couple basic examples of using it when it's not necessary to teach the mechanics but after that, it should be followed with the idiomatic vectorized version.

You could have a simple set of rules when it is considered OK to use apply.

  • for a Series - use it only when no pandas Series method exists (like the built-in type function)
  • for a DataFrame - use it when a Series method exists but not a DataFrame method (like value_counts or the accessors (str, dt, and cat). Or other complex functions from other libraries like finding the edit distance.

@jreback
Copy link
Contributor

jreback commented Dec 10, 2017

sure all for a warning box
example of what folks do and what some better solns are

@tdpetrou
Copy link
Contributor Author

Ok cool. I'll make some changes to the apply docs in the next few days.

@jreback jreback added Difficulty Intermediate and removed Needs Info Clarification about behavior needed to assess issue labels Dec 10, 2017
@jreback jreback added this to the Next Major Release milestone Dec 10, 2017
@jreback
Copy link
Contributor

jreback commented Dec 10, 2017

some related .apply doc issues (depending on what is written, some of these may be closable)

#3928
#5299

@mroeschke mroeschke added the Apply Apply, Aggregate, Transform, Map label Jun 12, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Docs
Projects
None yet
Development

No branches or pull requests

4 participants