-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
ENH: Add wide_to_long convenience function #5564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
looks fine, can you add a mention in release notes / v0.13.0 (example if you want)...thxs |
does this close an issue? |
As a remark from somebody who isn't familiar with the function(ality) and is just looking at the docstring, it is not really clear for me what it does. What is meant with wide panel/long format? What is a subobservation? What are stub names? Terminology that I don't really have encountered yet (in pandas). |
We'd discussed putting this as a keyword arg to melt in #4920 - thoughts on that? |
@jorisvandenbossche Yes, it's inspired by stata, but wide format panel data is fairly common in longitudinal datasets in economics. There are many stack overflow questions on how to do this in pandas. Stub names are what I'm calling the common roots of the variable names. Stub because it's the shorter part of the variable name that you want. E.,g. for "A\d*", it's "A". See the example given in the docstring. @jratner I guess I wouldn't mind if the functionality were in melt, but sometimes convenience functions are nice. Especially if you haven't used 'melt' in R, then you wouldn't look there. I certainly didn't. (I also find the melt docstring to be sparse, though the examples help immensely.) C.f., and others http://stackoverflow.com/questions/17688155/complicated-for-me-reshaping-from-wide-to-long-in-pandas |
can you add a release notes entry reffing the issue? otherwise good to go |
Will try to remember when I get home tonight. Y'all need to hook something like this up. Life saver. https://github.com/statsmodels/statsmodels/blob/master/tools/github_stats.py |
I tried this function to solve the stackoverflow question you linked to, but I get an error: http://nbviewer.ipython.org/github/jorisvandenbossche/scipy_notebooks/blob/master/pandas-pr-wide_to_long.ipynb#Test-example-of-stackoverflow (it's due to the fact that the second part of the column names (after the stub) are no ints but strings) |
Yes, this is a convenience function intended for proper panel (longitudinal) data where the stub variables are wide format timeseries data. I suppose there's nothing stopping this from taking a stub ending regex that defaults to ints though. I'll have a look. |
@jseabold merge this? hold off to look at handling ints? (or just raise as NotIMplemented for now)? |
Hold off, if you can. I should be able to get to it this weekend. |
ping @jseabold |
Fixed the int conversion. Someone could likely make this more general. Also rebased. |
newdf_j = newdf[j].str.replace(stub, "") | ||
try: # if it's an int, treat it as such, is there some pandas-fu | ||
# for type inference? | ||
newdf_j = newdf_j.astype(int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
com.is_integer()
pls squash down and can merge also if you want to put your docstring example in reshape.rst (if you think its needed), you can put a mention in v0.13.0 as well (up2u) |
thanks @jseabold |
closes #4920
I thought I submitted this PR long ago.