BUG: Multi-column explode does not raise ValueError for length-1 and empty list #53512

bherwerth · 2023-06-03T09:54:56Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

df = pd.DataFrame([[[1], []]], columns=["A", "B"])
df.explode(column=["A", "B"]) # Does not raise ValueError: columns must have matching element counts

Issue Description

The lists [1] and [] are of unequal length, thus I expect as ValueError as is the case for length-2 vs length-0 lists ([0, 1] vs []).
I checked pandas version '1.5.2' where the ValueError is raised as expected.
I think this is related to BUG: df.explode mulitcol with Nan+emptylist #49680, where mylen is assigned 1 for both the empty and the length-1 list:

pandas/pandas/core/frame.py

Line 9284 in 37783d4

mylen = lambda x: len(x) if (is_list_like(x) and len(x) > 0) else 1

Expected Behavior

 import pandas as pd

df = pd.DataFrame([[[1], []]], columns=["A", "B"])
df.explode(column=["A", "B"]) # Raises ValueError: columns must have matching element counts

Installed Versions

INSTALLED VERSIONS

commit : 37783d4
python : 3.10.8.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.49-linuxkit
Version : #1 SMP Tue Sep 13 07:51:46 UTC 2022
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.1.0.dev0+900.g37783d4bcb
numpy : 1.24.3
pytz : 2023.3
dateutil : 2.8.2
setuptools : 63.2.0
pip : 23.1.2
Cython : 0.29.33
pytest : 7.3.1
hypothesis : 6.75.9
sphinx : 6.2.1
blosc : 1.11.1
feather : None
xlsxwriter : 3.1.2
lxml.etree : 4.9.2
html5lib : 1.1
pymysql : 1.0.3
psycopg2 : 2.9.6
jinja2 : 3.1.2
IPython : 8.13.2
pandas_datareader: None
bs4 : 4.12.2
bottleneck : 1.3.7
brotli :
fastparquet : 2023.4.0
fsspec : 2023.5.0
gcsfs : 2023.5.0
matplotlib : 3.7.1
numba : 0.57.0
numexpr : 2.8.4
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 12.0.0
pyreadstat : 1.2.1
pyxlsb : 1.0.10
s3fs : 2023.5.0
scipy : 1.10.1
snappy :
sqlalchemy : 2.0.15
tables : 3.8.0
tabulate : 0.9.0
xarray : 2023.5.0
xlrd : 2.0.1
zstandard : 0.21.0
tzdata : 2023.3
qtpy : None
pyqt5 : None

The text was updated successfully, but these errors were encountered:

bherwerth · 2023-06-03T15:13:49Z

Reading the discussion in #46084, I now tend to think that allowing df.explode(column=["A", "B"]) is indeed the intended behavior and it's also consistent with exploding the empty list into nan in a single-column operation.

Maybe the current behavior is not 100% consistent with the docstring, where it says that it raises a ValueError:

pandas/pandas/core/frame.py

Lines 8803 to 8804 in 965ceca

    
                       * If specified columns to explode have not matching count of 
        
                         elements rowwise in the frame.

(I stumbled upon this when downgrading from pandas 2 to pandas<2 and the previous behavior made more sense to me at first, considering the docstring.)

My only suggestion would be to perhaps make the docstring clearer. For example, one could say the ValueError is raised if the row-wise counts are inconsistent and then perhaps explain that an empty list is consistent with a length-1 list.

Anyway, happy to close the issue if not relevant.

topper-123 · 2023-06-04T11:37:57Z

Yes, I think this intended. Whether an empty list should give a nan is maybe not always super clear it that is always the users intended behaviour, but that's the chosen route.

I'll close this PR. If you got something to add, fell free to add a comment and I'll reopen.

bherwerth added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 3, 2023

topper-123 added Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 4, 2023

topper-123 closed this as not planned Won't fix, can't repro, duplicate, stale Jun 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Multi-column explode does not raise ValueError for length-1 and empty list #53512

BUG: Multi-column explode does not raise ValueError for length-1 and empty list #53512

bherwerth commented Jun 3, 2023

INSTALLED VERSIONS

bherwerth commented Jun 3, 2023

Uh oh!

topper-123 commented Jun 4, 2023

Uh oh!

Uh oh!

BUG: Multi-column explode does not raise ValueError for length-1 and empty list #53512

BUG: Multi-column explode does not raise ValueError for length-1 and empty list #53512

Comments

bherwerth commented Jun 3, 2023

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

bherwerth commented Jun 3, 2023

Uh oh!

topper-123 commented Jun 4, 2023

Uh oh!