Skip to content

BUG: iloc assignment in Pandas 1.2.0 #39261

@quant-dc

Description

@quant-dc
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import pandas as pd

# %% Create data

data = pd.DataFrame([
    [1,2,3,4],
    [1,2,3,4]
], columns=['A', 'B', 'C', 'D'])

# %% Check missing data assignment
addition = pd.DataFrame([
    [5,6,7],
    [1,2,3]
], columns=['A', 'B', 'C'])

data.iloc[0, :] = addition.iloc[0, :]  # Fails

# %% Check misaligned assignment
data = pd.DataFrame([
    [1,2,3,4],
    [1,2,3,4]
], columns=['A', 'B', 'C', 'D'])

addition_misaligned = pd.DataFrame([
    [5,6,7,8],
    [1,2,4,3]
], columns=['A', 'B', 'D', 'C'])

data.iloc[0, :] = addition_misaligned.iloc[0, :]  # Works, result incorrect
pd.testing.assert_frame_equal(data, addition_misaligned, check_like=True)

Problem description

My apologies if this is two reports in one.

Previously (pandas 1.1.5), assignment of partial series to rows in a DataFrame would complete and the alignment would be such that a missing value was applied to the element that was missing in the assignment. This now errors with a broadcast shape error.

Related, assignment of a fully specified series works, but the labels are not assigned correctly and instead assigned by their matrix position. Again, the values were aligned in pandas 1.1.5.

Reading the discussion on #39004 I think this may be related. And the error is similar to this comment.

Traceback (most recent call last):
  File "venv/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-c0c0cf213f59>", line 13, in <module>
    data.iloc[0, :] = addition.iloc[0, :]  # Fails
  File "venv/lib/python3.7/site-packages/pandas/core/indexing.py", line 695, in __setitem__
    iloc._setitem_with_indexer(indexer, value, self.name)
  File "venv/lib/python3.7/site-packages/pandas/core/indexing.py", line 1644, in _setitem_with_indexer
    self._setitem_single_block(indexer, value, name)
  File "venv/lib/python3.7/site-packages/pandas/core/indexing.py", line 1872, in _setitem_single_block
    self.obj._mgr = self.obj._mgr.setitem(indexer=indexer, value=value)
  File "venv/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 565, in setitem
    return self.apply("setitem", indexer=indexer, value=value)
  File "venv/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 431, in apply
    applied = getattr(b, f)(**kwargs)
  File "venv/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 994, in setitem
    values[indexer] = value
ValueError: could not broadcast input array from shape (3) into shape (4)

[this should explain why the current behaviour is a problem and why the expected output is a better solution]

Expected Output

Assignment of the data to DataFrame respecting the labels of the assigning data.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 3e89b4c
python : 3.7.3.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Tue Nov 10 00:10:30 PST 2020; root:xnu-6153.141.10~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : en_GB.UTF-8
pandas : 1.2.0
numpy : 1.19.5
pytz : 2019.3
dateutil : 2.8.1
pip : 19.0.3
setuptools : 40.8.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.12.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions