API: column ordering on `get_dummies` #12010 #17612

Giftlin · 2017-09-21T12:41:49Z

Using get_dummies is moving the columns to the end. What @jreback and @TomAugspurger have commented is right. But there are situations in which we require to preserve the order. For example, in scipy model creation algorithms, the functions give preference for the columns based on the order of the columns. We are having to reorder the columns explicitly. I get it that it is not pandas' concern. But it will be better if we have an option to preserve.

So, I guess we must atleast have an option to preserve the order. Let the default be as the new columns to be at the end.

TomAugspurger · 2017-09-21T17:33:52Z

What would "preserve the order" mean here? Could you show an example? I don't know if we have a keyword like this anywhere else in the library, so this would be a bit unusual. However, given that the function knows the names of the newly created columns, while the user might not, this may be worth adding.

…

On Thu, Sep 21, 2017 at 7:41 AM, Giftlin Rajaiah ***@***.***> wrote: Using get_dummies is moving the columns to the end. What @jreback <https://github.com/jreback> and @TomAugspurger <https://github.com/tomaugspurger> have commented is right. But there are situations in which we require to preserve the order. For example, in scipy model creation algorithms, the functions give preference for the columns based on the order of the columns. We are having to reorder the columns explicitly. So, I guess we must atleast have an option to preserve the order. Let the default be as the new columns to be at the end. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#17612>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIuXv87CLlh6HfxSbzyRCdqxEX0ygks5sklmUgaJpZM4PfQjl> .

Giftlin · 2017-09-21T17:51:19Z

import pandas as pd

df = pd.DataFrame({'A': ["apple", "apple", "orange", "orange", "lemon", "lemon"],
'B': [98, 87, 45, 25, 12, 5]})
df = pd.get_dummies(df)
print(df)

Output:

Output (preserving order of columns):

jreback · 2017-09-21T19:58:39Z

categoricals already allow you to provide an order (note that these are not stricly ordered=True), rather you are providing the order, as opposed to a lex-sort order.

And this works as expected.

In [10]: df['A_ordered'] = df['A'].astype('category', categories=['apple', 'orange', 'lemon'])

In [11]: df
Out[11]: 
        A   B A_ordered
0   apple  98     apple
1   apple  87     apple
2  orange  45    orange
3  orange  25    orange
4   lemon  12     lemon
5   lemon   5     lemon

In [12]: df.A_ordered
Out[12]: 
0     apple
1     apple
2    orange
3    orange
4     lemon
5     lemon
Name: A_ordered, dtype: category
Categories (3, object): [apple, orange, lemon]

In [13]: pd.get_dummies(df.A)
Out[13]: 
   apple  lemon  orange
0      1      0       0
1      1      0       0
2      0      0       1
3      0      0       1
4      0      1       0
5      0      1       0

In [14]: pd.get_dummies(df.A_ordered)
Out[14]: 
   apple  orange  lemon
0      1       0      0
1      1       0      0
2      0       1      0
3      0       1      0
4      0       0      1
5      0       0      1

Giftlin · 2017-09-21T20:01:52Z

@jreback no, I'm not talking about column values here. It is the column I am talking about.

jreback · 2017-09-21T20:23:19Z

In [6]: pd.concat([df.drop(['A'], axis=1), pd.get_dummies(df.A)], axis=1)
Out[6]: 
    B  apple  lemon  orange
0  98      1      0       0
1  87      1      0       0
2  45      0      0       1
3  25      0      0       1
4  12      0      1       0
5   5      0      1       0

TomAugspurger · 2017-09-21T21:00:54Z

I think that example only works here since A was first.

My initial reaction was to just have the user to pd.get_dummies(df)[desired_order]. Getting that order is somewhat difficult...

def get_order(df):
    order = []
    for col in df.columns:
        if pd.api.types.is_categorical(df[col]):
            order.extend(['{}_{}'.format(col, val)
                          for val in df[col].cat.categories])
        else:
            order.append(col)
    return order

That doesn't handle object types, but it wouldn't be hard to fix it to do that.

We could start with a cookbook recipe? And expand to a keyword argument if others want it? This is on the borderline of whether or not it's worth a keyword to me.

jorisvandenbossche · 2017-09-22T07:35:21Z

You can also do something like this to preserve the exact order:

In [178]: pd.concat([pd.get_dummies(df[col], prefix=col) if df[col].dtype == object else df[col] for col in df], axis=1)
Out[178]: 
   A_apple  A_lemon  A_orange   B
0        1        0         0  98
1        1        0         0  87
2        0        0         1  45
3        0        0         1  25
4        0        1         0  12
5        0        1         0   5

Only the if df[col].dtype == object should be optimized to do exactly what you want (eg probably also categorical).

kimsin98 · 2021-08-23T03:06:08Z

This needs to be reopened. @jreback's response is irrelevant. What @Giftlin proposed is inserting new dummy columns where categorical column used to be, rather than appending them to the end. It has nothing to do with order of items within categoricals.

I would even go as far as saying users probably expect column orders to be preserved when using get_dummies, so this should be the default.

greenare · 2023-08-23T02:07:00Z

@jreback you totally misunderstood what is going on. Please reopen this issue.

gfyoung added the API Design label Sep 21, 2017

Giftlin added a commit to Giftlin/pandas that referenced this issue Sep 21, 2017

column ordering on get_dummies pandas-dev#17612

6cdbe4f

Giftlin added a commit to Giftlin/pandas that referenced this issue Sep 21, 2017

test - get_dummy pandas-dev#17612

056ffe6

jreback closed this as completed Sep 21, 2017

jreback added Categorical Categorical Data Type Usage Question labels Sep 21, 2017

jreback added this to the No action milestone Sep 21, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

API: column ordering on `get_dummies` #12010 #17612

API: column ordering on `get_dummies` #12010 #17612

Giftlin commented Sep 21, 2017 •

edited

Loading

TomAugspurger commented Sep 21, 2017 via email

Uh oh!

Giftlin commented Sep 21, 2017 •

edited

Loading

Uh oh!

jreback commented Sep 21, 2017

Uh oh!

Giftlin commented Sep 21, 2017

Uh oh!

jreback commented Sep 21, 2017

Uh oh!

TomAugspurger commented Sep 21, 2017

Uh oh!

jorisvandenbossche commented Sep 22, 2017

Uh oh!

kimsin98 commented Aug 23, 2021 •

edited

Loading

Uh oh!

greenare commented Aug 23, 2023

Uh oh!

Uh oh!

API: column ordering on get_dummies #12010 #17612

API: column ordering on get_dummies #12010 #17612

Comments

Giftlin commented Sep 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TomAugspurger commented Sep 21, 2017 via email

Uh oh!

Giftlin commented Sep 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jreback commented Sep 21, 2017

Uh oh!

Giftlin commented Sep 21, 2017

Uh oh!

jreback commented Sep 21, 2017

Uh oh!

TomAugspurger commented Sep 21, 2017

Uh oh!

jorisvandenbossche commented Sep 22, 2017

Uh oh!

kimsin98 commented Aug 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greenare commented Aug 23, 2023

Uh oh!

API: column ordering on `get_dummies` #12010 #17612

API: column ordering on `get_dummies` #12010 #17612

Giftlin commented Sep 21, 2017 •

edited

Loading

Giftlin commented Sep 21, 2017 •

edited

Loading

kimsin98 commented Aug 23, 2021 •

edited

Loading