Skip to content

ValueError: Duplicated level error with groupby on index for expanding #21732

Closed
@philippegr

Description

@philippegr

Code Sample, a copy-pastable example if possible

# Building Dataframe (reused from example, the fact that it uses datetime and numpy is probably not that important) 
date_range = pd.date_range(start=dt.datetime(2017,1,1), end=dt.datetime(2020,12,31), freq='W')
to_concat = []
for val in range(1,5):
    frame_tmp = pd.DataFrame()
    frame_tmp['DT'] = date_range
    frame_tmp['type'] = val
    frame_tmp['value'] = np.random.randint(1, 6, frame_tmp.shape[0])
    to_concat.append(frame_tmp)

df = pd.concat(to_concat, ignore_index=True)

# Does not work under pandas 0.23.0 and 0.23.1 worked in 0.22 and for some versions before
df.set_index('DT').groupby(level=0)['value'].expanding().mean()

# Workaround not setting as index is fine
df.groupby(['DT'])['value'].expanding().mean()

Problem description

When using expanding().mean() in a groupby on index as in the above code, pandas 0.23.1 and 0.23.0 produce an error:
ValueError: Duplicated level name: "DT", assigned to level 1, is already used for level 0.

Previous version would insert another level with the same name. So it is probably linked with a change in error being thrown.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-45-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.1
pytest: None
pip: 10.0.1
setuptools: 39.2.0
Cython: None
numpy: 1.14.5
scipy: None
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: 3.4.4
numexpr: 2.6.5
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDuplicate ReportDuplicate issue or pull requestGroupbyRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions