You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
importpandasaspddf=pd.DataFrame([[[1], []]], columns=["A", "B"])
df.explode(column=["A", "B"]) # Does not raise ValueError: columns must have matching element counts
Issue Description
The lists [1] and [] are of unequal length, thus I expect as ValueError as is the case for length-2 vs length-0 lists ([0, 1] vs []).
I checked pandas version '1.5.2' where the ValueError is raised as expected.
mylen=lambdax: len(x) if (is_list_like(x) andlen(x) >0) else1
Expected Behavior
importpandasaspddf=pd.DataFrame([[[1], []]], columns=["A", "B"])
df.explode(column=["A", "B"]) # Raises ValueError: columns must have matching element counts
Installed Versions
INSTALLED VERSIONS
commit : 37783d4
python : 3.10.8.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.49-linuxkit
Version : #1 SMP Tue Sep 13 07:51:46 UTC 2022
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8
Reading the discussion in #46084, I now tend to think that allowing df.explode(column=["A", "B"]) is indeed the intended behavior and it's also consistent with exploding the empty list into nan in a single-column operation.
Maybe the current behavior is not 100% consistent with the docstring, where it says that it raises a ValueError:
* If specified columns to explode have not matching count of
elements rowwise in the frame.
(I stumbled upon this when downgrading from pandas 2 to pandas<2 and the previous behavior made more sense to me at first, considering the docstring.)
My only suggestion would be to perhaps make the docstring clearer. For example, one could say the ValueError is raised if the row-wise counts are inconsistent and then perhaps explain that an empty list is consistent with a length-1 list.
Yes, I think this intended. Whether an empty list should give a nan is maybe not always super clear it that is always the users intended behaviour, but that's the chosen route.
I'll close this PR. If you got something to add, fell free to add a comment and I'll reopen.
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
[1]
and[]
are of unequal length, thus I expect asValueError
as is the case for length-2 vs length-0 lists ([0, 1]
vs[]
).mylen
is assigned 1 for both the empty and the length-1 list:pandas/pandas/core/frame.py
Line 9284 in 37783d4
Expected Behavior
Installed Versions
INSTALLED VERSIONS
commit : 37783d4
python : 3.10.8.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.49-linuxkit
Version : #1 SMP Tue Sep 13 07:51:46 UTC 2022
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.1.0.dev0+900.g37783d4bcb
numpy : 1.24.3
pytz : 2023.3
dateutil : 2.8.2
setuptools : 63.2.0
pip : 23.1.2
Cython : 0.29.33
pytest : 7.3.1
hypothesis : 6.75.9
sphinx : 6.2.1
blosc : 1.11.1
feather : None
xlsxwriter : 3.1.2
lxml.etree : 4.9.2
html5lib : 1.1
pymysql : 1.0.3
psycopg2 : 2.9.6
jinja2 : 3.1.2
IPython : 8.13.2
pandas_datareader: None
bs4 : 4.12.2
bottleneck : 1.3.7
brotli :
fastparquet : 2023.4.0
fsspec : 2023.5.0
gcsfs : 2023.5.0
matplotlib : 3.7.1
numba : 0.57.0
numexpr : 2.8.4
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 12.0.0
pyreadstat : 1.2.1
pyxlsb : 1.0.10
s3fs : 2023.5.0
scipy : 1.10.1
snappy :
sqlalchemy : 2.0.15
tables : 3.8.0
tabulate : 0.9.0
xarray : 2023.5.0
xlrd : 2.0.1
zstandard : 0.21.0
tzdata : 2023.3
qtpy : None
pyqt5 : None
The text was updated successfully, but these errors were encountered: