Skip to content

ENH: Add wide_to_long convenience function #5564

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 7, 2013

Conversation

jseabold
Copy link
Contributor

closes #4920

I thought I submitted this PR long ago.

@jreback
Copy link
Contributor

jreback commented Nov 21, 2013

looks fine, can you add a mention in release notes / v0.13.0 (example if you want)...thxs

@jreback
Copy link
Contributor

jreback commented Nov 21, 2013

does this close an issue?

@jorisvandenbossche
Copy link
Member

As a remark from somebody who isn't familiar with the function(ality) and is just looking at the docstring, it is not really clear for me what it does. What is meant with wide panel/long format? What is a subobservation? What are stub names? Terminology that I don't really have encountered yet (in pandas).
In first sight, it seems a little bit specific (or the terminology). But is is possible it comes from STATA? (I searched for "stub names" on the internet, and the only relevant thing I found was something from reshape in STATA).

@jtratner
Copy link
Contributor

@jseabold - you proposed it here - #4920 :)

@jtratner
Copy link
Contributor

We'd discussed putting this as a keyword arg to melt in #4920 - thoughts on that?

@jseabold
Copy link
Contributor Author

@jorisvandenbossche Yes, it's inspired by stata, but wide format panel data is fairly common in longitudinal datasets in economics. There are many stack overflow questions on how to do this in pandas.

Stub names are what I'm calling the common roots of the variable names. Stub because it's the shorter part of the variable name that you want. E.,g. for "A\d*", it's "A". See the example given in the docstring.

@jratner I guess I wouldn't mind if the functionality were in melt, but sometimes convenience functions are nice. Especially if you haven't used 'melt' in R, then you wouldn't look there. I certainly didn't. (I also find the melt docstring to be sparse, though the examples help immensely.)

C.f., and others http://stackoverflow.com/questions/17688155/complicated-for-me-reshaping-from-wide-to-long-in-pandas

@jreback
Copy link
Contributor

jreback commented Nov 27, 2013

can you add a release notes entry reffing the issue? otherwise good to go

@jseabold
Copy link
Contributor Author

Will try to remember when I get home tonight. Y'all need to hook something like this up. Life saver.

https://github.com/statsmodels/statsmodels/blob/master/tools/github_stats.py

@jorisvandenbossche
Copy link
Member

I tried this function to solve the stackoverflow question you linked to, but I get an error: http://nbviewer.ipython.org/github/jorisvandenbossche/scipy_notebooks/blob/master/pandas-pr-wide_to_long.ipynb#Test-example-of-stackoverflow (it's due to the fact that the second part of the column names (after the stub) are no ints but strings)

@jseabold
Copy link
Contributor Author

Yes, this is a convenience function intended for proper panel (longitudinal) data where the stub variables are wide format timeseries data. I suppose there's nothing stopping this from taking a stub ending regex that defaults to ints though. I'll have a look.

@jreback
Copy link
Contributor

jreback commented Nov 29, 2013

@jseabold merge this? hold off to look at handling ints? (or just raise as NotIMplemented for now)?

@jseabold
Copy link
Contributor Author

Hold off, if you can. I should be able to get to it this weekend.

@jreback
Copy link
Contributor

jreback commented Dec 2, 2013

ping @jseabold

@jseabold
Copy link
Contributor Author

jseabold commented Dec 2, 2013

Fixed the int conversion. Someone could likely make this more general. Also rebased.

newdf_j = newdf[j].str.replace(stub, "")
try: # if it's an int, treat it as such, is there some pandas-fu
# for type inference?
newdf_j = newdf_j.astype(int)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

com.is_integer()

@jreback
Copy link
Contributor

jreback commented Dec 3, 2013

pls squash down and can merge

also if you want to put your docstring example in reshape.rst (if you think its needed), you can put a mention in v0.13.0 as well (up2u)

@jreback jreback merged commit 42a8e97 into pandas-dev:master Dec 7, 2013
@jreback
Copy link
Contributor

jreback commented Dec 7, 2013

thanks @jseabold

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wide to long panel function
4 participants