ENH: Add wide_to_long convenience function #5564

jseabold · 2013-11-21T12:46:25Z

I thought I submitted this PR long ago.

jreback · 2013-11-21T20:30:02Z

looks fine, can you add a mention in release notes / v0.13.0 (example if you want)...thxs

jreback · 2013-11-21T20:30:18Z

does this close an issue?

jorisvandenbossche · 2013-11-21T23:16:28Z

As a remark from somebody who isn't familiar with the function(ality) and is just looking at the docstring, it is not really clear for me what it does. What is meant with wide panel/long format? What is a subobservation? What are stub names? Terminology that I don't really have encountered yet (in pandas).
In first sight, it seems a little bit specific (or the terminology). But is is possible it comes from STATA? (I searched for "stub names" on the internet, and the only relevant thing I found was something from reshape in STATA).

jtratner · 2013-11-22T07:49:46Z

@jseabold - you proposed it here - #4920 :)

jtratner · 2013-11-22T07:51:15Z

We'd discussed putting this as a keyword arg to melt in #4920 - thoughts on that?

jseabold · 2013-11-22T09:08:32Z

@jorisvandenbossche Yes, it's inspired by stata, but wide format panel data is fairly common in longitudinal datasets in economics. There are many stack overflow questions on how to do this in pandas.

Stub names are what I'm calling the common roots of the variable names. Stub because it's the shorter part of the variable name that you want. E.,g. for "A\d*", it's "A". See the example given in the docstring.

@jratner I guess I wouldn't mind if the functionality were in melt, but sometimes convenience functions are nice. Especially if you haven't used 'melt' in R, then you wouldn't look there. I certainly didn't. (I also find the melt docstring to be sparse, though the examples help immensely.)

C.f., and others http://stackoverflow.com/questions/17688155/complicated-for-me-reshaping-from-wide-to-long-in-pandas

jreback · 2013-11-27T16:15:21Z

can you add a release notes entry reffing the issue? otherwise good to go

jseabold · 2013-11-27T16:34:20Z

Will try to remember when I get home tonight. Y'all need to hook something like this up. Life saver.

https://github.com/statsmodels/statsmodels/blob/master/tools/github_stats.py

jorisvandenbossche · 2013-11-27T21:19:42Z

I tried this function to solve the stackoverflow question you linked to, but I get an error: http://nbviewer.ipython.org/github/jorisvandenbossche/scipy_notebooks/blob/master/pandas-pr-wide_to_long.ipynb#Test-example-of-stackoverflow (it's due to the fact that the second part of the column names (after the stub) are no ints but strings)

jseabold · 2013-11-27T21:46:13Z

Yes, this is a convenience function intended for proper panel (longitudinal) data where the stub variables are wide format timeseries data. I suppose there's nothing stopping this from taking a stub ending regex that defaults to ints though. I'll have a look.

jreback · 2013-11-29T17:40:44Z

@jseabold merge this? hold off to look at handling ints? (or just raise as NotIMplemented for now)?

jseabold · 2013-11-29T18:02:18Z

Hold off, if you can. I should be able to get to it this weekend.

jreback · 2013-12-02T18:44:16Z

ping @jseabold

jseabold · 2013-12-02T19:10:27Z

Fixed the int conversion. Someone could likely make this more general. Also rebased.

jreback · 2013-12-02T20:45:54Z

pandas/core/reshape.py

+        newdf_j = newdf[j].str.replace(stub, "")
+        try: # if it's an int, treat it as such, is there some pandas-fu
+             # for type inference?
+            newdf_j = newdf_j.astype(int)


com.is_integer()

jreback · 2013-12-03T11:31:03Z

pls squash down and can merge

also if you want to put your docstring example in reshape.rst (if you think its needed), you can put a mention in v0.13.0 as well (up2u)

jreback · 2013-12-07T14:16:17Z

thanks @jseabold

jreback reviewed Dec 2, 2013
View reviewed changes

jseabold added 3 commits December 3, 2013 12:10

ENH: Add wide_to_long helper function.

798cf28

ENH: Allow non-int j vals

2c5939b

DOC: Add wide_to_long to release notes.

42a8e97

jreback merged commit 42a8e97 into pandas-dev:master Dec 7, 2013

Uh oh!

ENH: Add wide_to_long convenience function #5564

ENH: Add wide_to_long convenience function #5564

Conversation

jseabold commented Nov 21, 2013

Uh oh!

jreback commented Nov 21, 2013

Uh oh!

jreback commented Nov 21, 2013

Uh oh!

jorisvandenbossche commented Nov 21, 2013

Uh oh!

jtratner commented Nov 22, 2013

Uh oh!

jtratner commented Nov 22, 2013

Uh oh!

jseabold commented Nov 22, 2013

Uh oh!

jreback commented Nov 27, 2013

Uh oh!

jseabold commented Nov 27, 2013

Uh oh!

jorisvandenbossche commented Nov 27, 2013

Uh oh!

jseabold commented Nov 27, 2013

Uh oh!

jreback commented Nov 29, 2013

Uh oh!

jseabold commented Nov 29, 2013

Uh oh!

jreback commented Dec 2, 2013

Uh oh!

jseabold commented Dec 2, 2013

Uh oh!

jreback Dec 2, 2013

Choose a reason for hiding this comment

Uh oh!

jreback commented Dec 3, 2013

Uh oh!

jreback commented Dec 7, 2013

Uh oh!

Uh oh!