Skip to content

BUG: Odd behavior of kwarg 'how' in MultiIndex-to-Index join #10147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dmsul opened this issue May 15, 2015 · 4 comments
Closed

BUG: Odd behavior of kwarg 'how' in MultiIndex-to-Index join #10147

dmsul opened this issue May 15, 2015 · 4 comments
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@dmsul
Copy link

dmsul commented May 15, 2015

import pandas as pd
import numpy as np

from pandas.util.testing import assert_frame_equal

df = pd.DataFrame(np.arange(12).reshape(-1, 3),
                  columns=['a', 'b', 'c'])
df.index = pd.MultiIndex.from_tuples(
    [(1, 2000), (1, 2001), (2, 2000), (2, 2001)],
    names=['id', 'year'])

s = pd.Series([True, True], index=[1, 3], name='flag')
s.index.name = 'id'

# These two should *NOT* be the same, but they are both like a standard 'inner'
left = df.join(s, how='left')
inner = df.join(s, how='inner')
assert_frame_equal(left, inner)
# These two should *NOT* be the same, but they are both like a standard 'outer'
right = df.join(s, how='right')
outer = df.join(s, how='outer')
assert_frame_equal(right, outer)
# A standard 'left' merge is actually achieved with a MultiIndex by invoking `how='right'`
std_left = df.reset_index('year').join(s, how='left').set_index('year', append=True)
assert_frame_equal(right, std_left)

My impression from the example in #6356 is that the effect of how should not change with a MultiIndex. I know that #6363 is what was actually merged, so please correct me if my understanding is wrong.

INSTALLED VERSIONS

commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.16.1
nose: 1.3.6
Cython: 0.22
numpy: 1.9.2
scipy: 0.15.1
statsmodels: 0.6.1
IPython: 3.1.0
sphinx: None
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.4
bottleneck: 1.1.0dev
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.3
openpyxl: None
xlrd: 0.9.3
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.4
pymysql: None
psycopg2: None

@dmsul dmsul changed the title Odd behavior of join kwarg 'how' with MultiIndex-to-Index join BUG: Odd behavior of kwarg 'how' in MultiIndex-to-Index join May 15, 2015
@jreback
Copy link
Contributor

jreback commented May 18, 2015

this all passes. what exactly is the issue?

@jreback jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label May 18, 2015
@dmsul
Copy link
Author

dmsul commented May 18, 2015

Sorry, I should have structured the asserts differently and offered more of an explanation. If everything was working correctly, none of assert statements should have passed (is there an 'assert not equals', or wrap assert in assert_raises?).

The example code above should result in four unique output DataFrames, one for each how type. The first two assertions show that this is not the case: left and inner are the same, right and outer are the same. The last assertion shows that 'left' and 'right' are also kind of switched when using the MultiIndex, relative to the standard case.

I have edited the comments and structure of the example code to be (hopefully) a little clearer.

@jreback
Copy link
Contributor

jreback commented May 18, 2015

Converting to standard form, e.g straight merges. These looks ok to me.

So its possible that the aligning logic for frame/series is off. Want to dig in?

In [38]: s2 = s.to_frame().reset_index()

In [39]: df2 = df.reset_index()

In [40]: s2
Out[40]: 
   id  flag
0   1  True
1   3  True

In [41]: df2
Out[41]: 
   id  year  a   b   c
0   1  2000  0   1   2
1   1  2001  3   4   5
2   2  2000  6   7   8
3   2  2001  9  10  11

In [42]: pd.merge(df2,s2,on=['id'],how='left')
Out[42]: 
   id  year  a   b   c  flag
0   1  2000  0   1   2  True
1   1  2001  3   4   5  True
2   2  2000  6   7   8   NaN
3   2  2001  9  10  11   NaN

In [43]: pd.merge(df2,s2,on=['id'],how='inner')
Out[43]: 
   id  year  a  b  c  flag
0   1  2000  0  1  2  True
1   1  2001  3  4  5  True

In [44]: pd.merge(df2,s2,on=['id'],how='right')
Out[44]: 
   id  year   a   b   c  flag
0   1  2000   0   1   2  True
1   1  2001   3   4   5  True
2   3   NaN NaN NaN NaN  True

In [45]: pd.merge(df2,s2,on=['id'],how='outer')
Out[45]: 
   id  year   a   b   c  flag
0   1  2000   0   1   2  True
1   1  2001   3   4   5  True
2   2  2000   6   7   8   NaN
3   2  2001   9  10  11   NaN
4   3   NaN NaN NaN NaN  True

@dmsul
Copy link
Author

dmsul commented May 18, 2015

Sure, I'll give it a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

2 participants