-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
import pandas as pd
# %% Create data
data = pd.DataFrame([
[1,2,3,4],
[1,2,3,4]
], columns=['A', 'B', 'C', 'D'])
# %% Check missing data assignment
addition = pd.DataFrame([
[5,6,7],
[1,2,3]
], columns=['A', 'B', 'C'])
data.iloc[0, :] = addition.iloc[0, :] # Fails
# %% Check misaligned assignment
data = pd.DataFrame([
[1,2,3,4],
[1,2,3,4]
], columns=['A', 'B', 'C', 'D'])
addition_misaligned = pd.DataFrame([
[5,6,7,8],
[1,2,4,3]
], columns=['A', 'B', 'D', 'C'])
data.iloc[0, :] = addition_misaligned.iloc[0, :] # Works, result incorrect
pd.testing.assert_frame_equal(data, addition_misaligned, check_like=True)
Problem description
My apologies if this is two reports in one.
Previously (pandas 1.1.5), assignment of partial series to rows in a DataFrame would complete and the alignment would be such that a missing value was applied to the element that was missing in the assignment. This now errors with a broadcast shape error.
Related, assignment of a fully specified series works, but the labels are not assigned correctly and instead assigned by their matrix position. Again, the values were aligned in pandas 1.1.5.
Reading the discussion on #39004 I think this may be related. And the error is similar to this comment.
Traceback (most recent call last):
File "venv/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-2-c0c0cf213f59>", line 13, in <module>
data.iloc[0, :] = addition.iloc[0, :] # Fails
File "venv/lib/python3.7/site-packages/pandas/core/indexing.py", line 695, in __setitem__
iloc._setitem_with_indexer(indexer, value, self.name)
File "venv/lib/python3.7/site-packages/pandas/core/indexing.py", line 1644, in _setitem_with_indexer
self._setitem_single_block(indexer, value, name)
File "venv/lib/python3.7/site-packages/pandas/core/indexing.py", line 1872, in _setitem_single_block
self.obj._mgr = self.obj._mgr.setitem(indexer=indexer, value=value)
File "venv/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 565, in setitem
return self.apply("setitem", indexer=indexer, value=value)
File "venv/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 431, in apply
applied = getattr(b, f)(**kwargs)
File "venv/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 994, in setitem
values[indexer] = value
ValueError: could not broadcast input array from shape (3) into shape (4)
[this should explain why the current behaviour is a problem and why the expected output is a better solution]
Expected Output
Assignment of the data to DataFrame respecting the labels of the assigning data.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : 3e89b4c
python : 3.7.3.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Tue Nov 10 00:10:30 PST 2020; root:xnu-6153.141.10~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : en_GB.UTF-8
pandas : 1.2.0
numpy : 1.19.5
pytz : 2019.3
dateutil : 2.8.1
pip : 19.0.3
setuptools : 40.8.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.12.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None