Skip to content

dataframe.groupby incorrect with multiindex and None value #32492

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ataudt opened this issue Mar 6, 2020 · 2 comments · Fixed by CSCD01-team14/pandas#2 or #45982
Closed

dataframe.groupby incorrect with multiindex and None value #32492

ataudt opened this issue Mar 6, 2020 · 2 comments · Fixed by CSCD01-team14/pandas#2 or #45982
Assignees
Labels
good first issue Groupby MultiIndex Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@ataudt
Copy link

ataudt commented Mar 6, 2020

Minimally reproducible example

# pandas==0.24.2
df = pd.DataFrame(data={
    'A': ['a1','a2',None],
    'B': ['b1','b2','b1'],
    'val': [1,2,3],
})
print(df)
print('\n')

grps = df.groupby(['A', 'B'])
print(grps.get_group(('a2','b1')))

Problem description

Selecting a group which doesn't appear, I obtain a group which should not exist.

Expected Output

KeyError

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 62 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.24.2
pytest: None
pip: 19.0.3
setuptools: 40.8.0
Cython: None
numpy: 1.16.2
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.4.0
sphinx: 2.0.1
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.3
openpyxl: None
xlrd: 1.2.0
xlwt: None
xlsxwriter: 1.1.7
lxml.etree: 4.3.3
bs4: None
html5lib: 1.0.1
sqlalchemy: 1.3.8
pymysql: 0.9.3
psycopg2: 2.8.3 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@WillySong
Copy link

take

@mroeschke
Copy link
Member

Looks fixed on master. Could use a test

In [7]: df = pd.DataFrame(data={
   ...:     'A': ['a1','a2',None],
   ...:     'B': ['b1','b2','b1'],
   ...:     'val': [1,2,3],
   ...: })
   ...: print(df)
   ...: print('\n')
   ...:
   ...: grps = df.groupby(['A', 'B'])
   ...: print(grps.get_group(('a2','b1')))
      A   B  val
0    a1  b1    1
1    a2  b2    2
2  None  b1    3
KeyError: ('a2', 'b1')

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions labels Jul 29, 2021
@jreback jreback added this to the 1.5 milestone Feb 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment