Skip to content

ValueError: Duplicated level error with groupby on index for expanding #21732

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
philippegr opened this issue Jul 4, 2018 · 1 comment
Closed
Labels
Bug Duplicate Report Duplicate issue or pull request Groupby Regression Functionality that used to work in a prior pandas version

Comments

@philippegr
Copy link

philippegr commented Jul 4, 2018

Code Sample, a copy-pastable example if possible

# Building Dataframe (reused from example, the fact that it uses datetime and numpy is probably not that important) 
date_range = pd.date_range(start=dt.datetime(2017,1,1), end=dt.datetime(2020,12,31), freq='W')
to_concat = []
for val in range(1,5):
    frame_tmp = pd.DataFrame()
    frame_tmp['DT'] = date_range
    frame_tmp['type'] = val
    frame_tmp['value'] = np.random.randint(1, 6, frame_tmp.shape[0])
    to_concat.append(frame_tmp)

df = pd.concat(to_concat, ignore_index=True)

# Does not work under pandas 0.23.0 and 0.23.1 worked in 0.22 and for some versions before
df.set_index('DT').groupby(level=0)['value'].expanding().mean()

# Workaround not setting as index is fine
df.groupby(['DT'])['value'].expanding().mean()

Problem description

When using expanding().mean() in a groupby on index as in the above code, pandas 0.23.1 and 0.23.0 produce an error:
ValueError: Duplicated level name: "DT", assigned to level 1, is already used for level 0.

Previous version would insert another level with the same name. So it is probably linked with a change in error being thrown.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-45-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.1
pytest: None
pip: 10.0.1
setuptools: 39.2.0
Cython: None
numpy: 1.14.5
scipy: None
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: 3.4.4
numexpr: 2.6.5
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jschendel
Copy link
Member

Thanks, this looks to be a duplicate of #21075, and is fixed on master. The fix should be included in 0.23.2, which will be released relatively soon.

Your example code does not raise on master:

In [2]: pd.__version__
Out[2]: '0.24.0.dev0+219.g1070976'

In [3]: # Building Dataframe (reused from example, the fact that it uses datetime and numpy is probably not that important) 
   ...: date_range = pd.date_range(start=dt.datetime(2017,1,1), end=dt.datetime(2020,12,31), freq='W')
   ...: to_concat = []
   ...: for val in range(1,5):
   ...:     frame_tmp = pd.DataFrame()
   ...:     frame_tmp['DT'] = date_range
   ...:     frame_tmp['type'] = val
   ...:     frame_tmp['value'] = np.random.randint(1, 6, frame_tmp.shape[0])
   ...:     to_concat.append(frame_tmp)
   ...: 
   ...: df = pd.concat(to_concat, ignore_index=True)
   ...: 

In [4]: # Does not work under pandas 0.23.0 and 0.23.1 worked in 0.22 and for some versions before
   ...: df.set_index('DT').groupby(level=0)['value'].expanding().mean()
Out[4]: 
DT          DT        
2017-01-01  2017-01-01    1.000000
            2017-01-01    2.000000
            2017-01-01    3.000000
            2017-01-01    2.750000
2017-01-08  2017-01-08    3.000000
            2017-01-08    4.000000
            2017-01-08    3.333333
            2017-01-08    2.750000
2017-01-15  2017-01-15    5.000000
            2017-01-15    3.000000
            2017-01-15    3.000000
            2017-01-15    2.500000
2017-01-22  2017-01-22    3.000000
            2017-01-22    2.500000
            2017-01-22    2.333333
            2017-01-22    2.250000
2017-01-29  2017-01-29    3.000000
            2017-01-29    2.000000
            2017-01-29    3.000000
            2017-01-29    2.750000
2017-02-05  2017-02-05    2.000000
            2017-02-05    2.000000
            2017-02-05    2.666667
            2017-02-05    3.000000
2017-02-12  2017-02-12    5.000000
            2017-02-12    5.000000
            2017-02-12    4.000000
            2017-02-12    3.500000
2017-02-19  2017-02-19    3.000000
            2017-02-19    4.000000
                            ...   
2020-11-08  2020-11-08    2.333333
            2020-11-08    2.500000
2020-11-15  2020-11-15    5.000000
            2020-11-15    4.000000
            2020-11-15    3.666667
            2020-11-15    4.000000
2020-11-22  2020-11-22    2.000000
            2020-11-22    3.500000
            2020-11-22    4.000000
            2020-11-22    3.750000
2020-11-29  2020-11-29    2.000000
            2020-11-29    2.500000
            2020-11-29    3.000000
            2020-11-29    3.250000
2020-12-06  2020-12-06    4.000000
            2020-12-06    3.000000
            2020-12-06    2.666667
            2020-12-06    3.250000
2020-12-13  2020-12-13    5.000000
            2020-12-13    5.000000
            2020-12-13    4.666667
            2020-12-13    3.750000
2020-12-20  2020-12-20    1.000000
            2020-12-20    1.000000
            2020-12-20    1.333333
            2020-12-20    1.750000
2020-12-27  2020-12-27    3.000000
            2020-12-27    3.500000
            2020-12-27    3.000000
            2020-12-27    3.250000
Name: value, Length: 836, dtype: float64

@jschendel jschendel added this to the No action milestone Jul 5, 2018
@jschendel jschendel added Bug Groupby Regression Functionality that used to work in a prior pandas version Duplicate Report Duplicate issue or pull request labels Jul 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Duplicate Report Duplicate issue or pull request Groupby Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

No branches or pull requests

2 participants