Skip to content

BUG: Multi-column explode does not raise ValueError for length-1 and empty list #53512

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
bherwerth opened this issue Jun 3, 2023 · 2 comments
Closed
3 tasks done
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@bherwerth
Copy link
Contributor

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

df = pd.DataFrame([[[1], []]], columns=["A", "B"])
df.explode(column=["A", "B"]) # Does not raise ValueError: columns must have matching element counts

Issue Description

  • The lists [1] and [] are of unequal length, thus I expect as ValueError as is the case for length-2 vs length-0 lists ([0, 1] vs []).
  • I checked pandas version '1.5.2' where the ValueError is raised as expected.
  • I think this is related to BUG: df.explode mulitcol with Nan+emptylist #49680, where mylen is assigned 1 for both the empty and the length-1 list:
    mylen = lambda x: len(x) if (is_list_like(x) and len(x) > 0) else 1

Expected Behavior

 import pandas as pd

df = pd.DataFrame([[[1], []]], columns=["A", "B"])
df.explode(column=["A", "B"]) # Raises ValueError: columns must have matching element counts

Installed Versions

INSTALLED VERSIONS

commit : 37783d4
python : 3.10.8.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.49-linuxkit
Version : #1 SMP Tue Sep 13 07:51:46 UTC 2022
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.1.0.dev0+900.g37783d4bcb
numpy : 1.24.3
pytz : 2023.3
dateutil : 2.8.2
setuptools : 63.2.0
pip : 23.1.2
Cython : 0.29.33
pytest : 7.3.1
hypothesis : 6.75.9
sphinx : 6.2.1
blosc : 1.11.1
feather : None
xlsxwriter : 3.1.2
lxml.etree : 4.9.2
html5lib : 1.1
pymysql : 1.0.3
psycopg2 : 2.9.6
jinja2 : 3.1.2
IPython : 8.13.2
pandas_datareader: None
bs4 : 4.12.2
bottleneck : 1.3.7
brotli :
fastparquet : 2023.4.0
fsspec : 2023.5.0
gcsfs : 2023.5.0
matplotlib : 3.7.1
numba : 0.57.0
numexpr : 2.8.4
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 12.0.0
pyreadstat : 1.2.1
pyxlsb : 1.0.10
s3fs : 2023.5.0
scipy : 1.10.1
snappy :
sqlalchemy : 2.0.15
tables : 3.8.0
tabulate : 0.9.0
xarray : 2023.5.0
xlrd : 2.0.1
zstandard : 0.21.0
tzdata : 2023.3
qtpy : None
pyqt5 : None

@bherwerth bherwerth added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 3, 2023
@bherwerth
Copy link
Contributor Author

Reading the discussion in #46084, I now tend to think that allowing df.explode(column=["A", "B"]) is indeed the intended behavior and it's also consistent with exploding the empty list into nan in a single-column operation.

Maybe the current behavior is not 100% consistent with the docstring, where it says that it raises a ValueError:

pandas/pandas/core/frame.py

Lines 8803 to 8804 in 965ceca

* If specified columns to explode have not matching count of
elements rowwise in the frame.

(I stumbled upon this when downgrading from pandas 2 to pandas<2 and the previous behavior made more sense to me at first, considering the docstring.)

My only suggestion would be to perhaps make the docstring clearer. For example, one could say the ValueError is raised if the row-wise counts are inconsistent and then perhaps explain that an empty list is consistent with a length-1 list.

Anyway, happy to close the issue if not relevant.

@topper-123 topper-123 added Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 4, 2023
@topper-123
Copy link
Contributor

Yes, I think this intended. Whether an empty list should give a nan is maybe not always super clear it that is always the users intended behaviour, but that's the chosen route.

I'll close this PR. If you got something to add, fell free to add a comment and I'll reopen.

@topper-123 topper-123 closed this as not planned Won't fix, can't repro, duplicate, stale Jun 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

2 participants