-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
No API reference for DataFrameGroupBy and "combining" step #2644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
+1. An undocumented GroupBy really limits the usefulness of Pandas. |
well the groupby docs are quite complete, what exactly is the problem? if you read thru them, you will see exactly what a http://pandas.pydata.org/pandas-docs/dev/groupby.html#splitting-an-object-into-groups And the fact that it applies the functions directly http://pandas.pydata.org/pandas-docs/dev/groupby.html#aggregation If you think their is something missing, the pls be specific about it
is a very odd statement |
There are some API docs missing for
|
Yep, I also follow the initial poster. @jreback I think the groupby guide is already in very good shape, certainly. But I personally think it is important to see that the user guide and reference docs are complementary, and both important. @mhlr It would indeed be a bit more constructive to say what you miss than just saying that it limits the usefulness. In fact, if you would want to do that, give some concrete things you miss in the docs about this, that would be very helpful for us! Really. @JanSchulz What do you exactly mean with the missing "combine" step? Because I think you misinterpret the combine step, as this is about combining the different groups together in one dataframe, not about combining the output of the groupby with the original array (for that you can just assign it to a column in the original array, or concat both, ..) That is not clear enough from the first paragraph in the docs? |
And something else, in the original issue @JanSchulz also mentions the fact that the docs only speak about the GroupBy object, while in practice you always get a DataFrameGroupBy of SeriesGroupBy object. For us it is maybe obvious, but not for a lot of users I thinkt (but not much to do about that I suppose? apart from mentioning it in the docs). |
In my experience with pandas i've almost never cared about the distinction between the types of groupby objects, whereas with series vs frame i have to be more cognizant of any api differences (but even then not so much). IMO distinguishing between the different kinds of groupbys in the main docs is too much detail and is best left for api docs. |
DataFrame.groupby is well documented. It returns a DataFrameGroupBy object which is not documented. |
@cpcloud yes, I agree fully with that, but still the user sees DataFrameGroupBy, and maybe you know that you shouldn't care about, but some other users don't |
I suppose easy enough to add the whitelisted methods in the groupby API section, but seems kind of repetive, maybe list them in the groupby docs (so we can auto-generate them)...that is the big issue here.
with the exception of certain methods which are slightly different (which is itself an API bug, they should be the same) |
i don't think listing whitelisted methods is appropriate ... why not just list the methods that are specific to groupby-type objects? |
@mhlr Thanks for the elaboration. The GroupBy object is indeed missing in the API docs (and it is not much work to add it actually, just one line in api.rst). As said in the previous comments, the DataFrameGroupby is actually the 'same' (for a user). |
@cpcloud Why isn't it appropriate? Why otherwise whitelist them? They are whitelisted because they are usefull, so user should be able to know about that? |
well, can we auto-generate them in the API section? (want to avoid manually doing this, so they they are only in 1 place, namely the whitelist) |
I think it would be useful to {DataFrame,Series}GroupBy API pages as well even if they just link or redirect to the GroupBy page just to avoid the initial confusion caused by not being able to find them. Definitely should have a GroupBy page. |
@mhlr did you see this: Well some are already documented: http://pandas.pydata.org/pandas-docs/stable/api.html#groupby |
But not all are that useful ... (aggregate and transform are just empty) @jreback about listing the whitelisted methods automatically, I don't know if this is possible. Because, in a python session, they don't appear if you do |
@jorisvandenbossche ahh...I think |
@jreback actually it is numpydoc (not sphinx autodoc) that is doing this, and they use |
umm....you know better than I how that works, but if it uses |
yep, |
@jorisvandenbossche |
@jorisvandenbossche i was thinking that the whitelisted methods can be seen with tab completion and operate on each group in exactly the same way that they work without the groupby ... so was thinking that that's very repetitive ... not a huge deal |
|
@cpcloud actually I agree it's maybe a bit repetitive to have for each of them seperate generated api docstring pages, as it is indeed exactly the same method (so in that case they don't need to appear in |
@jorisvandenbossche fair enough :) |
you could generate the |
@jreback yep, that is actually true, and a good idea! Just inject a string with the list of methods in the groupby docstring from the @jreback strange, as the docstring of |
@jorisvandenbossche I think |
@jreback but I think only in class instances, not the class itself (because if I do |
I digged up my draft notes about this and made a new issue (#6944) with an overview of what I think is missing in the reference docs (made a new issue because this is already getting long and was originally not that broad, and you know I like clear overview issues :-)) |
ok....let's close this and use your new issue then maybe |
@jorisvandenbossche My issue was simple that the doc has the big header "split-apply-combine" but afterwards only talk about "split" and "apply", but never combines them (Only this is mentioned about combine: "Combining the results into a data structure"). As a newbie to that paradigm it was a bit confusing to miss that step in the docs. |
@JanSchulz OK, that can be indeed made more clear in the intro. The reason that afterwards there is no seperate "combine" section, is that it happens at the same time as the "apply". And you cannnot really adjust this as a user (no different ways to 'combine'); the groups are just concatenated to a dataframe/series (in contrast to eg |
@jorisvandenbossche is there a feature request in there somewhere? 😄 |
@cpcloud I don't know, is there a need? :-) |
Oh I see what you mean. Yes, that is a very nice aspect of pandas. |
At least by searching google, I found no references for the DataFrameGroupBy. The docs at http://pandas.pydata.org/pandas-docs/dev/groupby.html speak about a "GroupBy object"...
What are all the functions of that object? Using IPython, I get inline help, but this informations seems to be missing in the online docs.
I'm also missing the "combine" step in the split-apply-combine doc: how can values be added to the original dataframe? I only found something about transform, which returns a complete dataframe. grouped.apply() returns a single Series. Did I miss something or is the combine step really just a "merge the new dataframe/series with the old one".
The text was updated successfully, but these errors were encountered: