Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 1 addition & 7 deletions python/pyspark/pandas/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -1917,18 +1917,12 @@ def append(self, other: "Index") -> "Index":
sdf_other = other._internal.spark_frame.select(other._internal.index_spark_columns)
sdf_appended = sdf_self.union(sdf_other)

# names should be kept when MultiIndex, but Index wouldn't keep its name.
if isinstance(self, MultiIndex):
index_names = self._internal.index_names
else:
index_names = None

internal = InternalFrame(
spark_frame=sdf_appended,
index_spark_columns=[
scol_for(sdf_appended, col) for col in self._internal.index_spark_column_names
],
index_names=index_names,
index_names=None,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can simply set the index_names to None to follow the behavior of Pandas, since Pandas doesn't keep the name of MultiIndex when computing the append from Pandas 2.0.0. (See pandas-dev/pandas#48288 more detail)

cc @zhengruifeng @HyukjinKwon as CI passed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it a bug in Pandas that might be fixed in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's an intentional behavior since they mentioned this in "Bug fixes" section in their release note?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I thought it was a bug in Pandas that is fixed in the Pandas 2.0.0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we also mention this in our migration doc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I misunderstood the Pandas PR. LTGM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we also mention this in our migration doc?

Hmm.. I didn't mention this as a behavior change since it's a bug fix, but on second thought maybe we'd better to mention in the migration guide anyway.

Let me create a follow-up for updating the migration guide.

index_fields=index_fields,
)

Expand Down
26 changes: 3 additions & 23 deletions python/pyspark/pandas/tests/indexes/test_base_slow.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,29 +107,9 @@ def test_append(self):
psmidx1 = ps.from_pandas(pmidx1)
psmidx2 = ps.from_pandas(pmidx2)

# TODO(SPARK-43241): MultiIndex.append not checking names for equality.
# Also refer to https://github.com/pandas-dev/pandas/pull/48288.
if LooseVersion(pd.__version__) >= LooseVersion("2.0.0"):
self.assert_eq(
pmidx1.append(pmidx2), psmidx1.append(psmidx2).rename([None, None, None])
)
else:
self.assert_eq(pmidx1.append(pmidx2), psmidx1.append(psmidx2))

if LooseVersion(pd.__version__) >= LooseVersion("2.0.0"):
self.assert_eq(
pmidx2.append(pmidx1), psmidx2.append(psmidx1).rename([None, None, None])
)
else:
self.assert_eq(pmidx2.append(pmidx1), psmidx2.append(psmidx1))

if LooseVersion(pd.__version__) >= LooseVersion("2.0.0"):
self.assert_eq(
pmidx1.append(pmidx2).names,
psmidx1.append(psmidx2).rename([None, None, None]).names,
)
else:
self.assert_eq(pmidx1.append(pmidx2).names, psmidx1.append(psmidx2).names)
self.assert_eq(pmidx1.append(pmidx2), psmidx1.append(psmidx2))
self.assert_eq(pmidx2.append(pmidx1), psmidx2.append(psmidx1))
self.assert_eq(pmidx1.append(pmidx2).names, psmidx1.append(psmidx2).names)

# Index & MultiIndex is currently not supported
expected_error_message = r"append\(\) between Index & MultiIndex is currently not supported"
Expand Down