Skip to content

[SPARK-40589][PS][TEST] Fix test for DataFrame.corr_with skip the pandas regression #38031

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

itholic
Copy link
Contributor

@itholic itholic commented Sep 28, 2022

What changes were proposed in this pull request?

This PR proposes to skip the DataFrame.corr_with test when the other is pyspark.pandas.Series and the method is "spearman" or "pearson", since there is regression in pandas 1.5.0 for that cases.

Why are the changes needed?

There are some regressions in pandas 1.5.0, so we're not going to match the behavior for those cases.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Manually tested with pandas 1.5.0, confirmed the test pass.

@@ -6076,7 +6076,13 @@ def test_corrwith(self):
def _test_corrwith(self, psdf, psobj):
pdf = psdf.to_pandas()
pobj = psobj.to_pandas()
for method in ["pearson", "spearman", "kendall"]:
# Regression in pandas 1.5.0 when other is Series and method is "pearson" or "spearman"
# See https://github.com/pandas-dev/pandas/issues/48826
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: This is the report to the pandas community to confirm if it officially regression in pandas.

@xinrong-meng
Copy link
Member

Good catch! LGTM if the regression is confirmed from the pandas community.

# Regression in pandas 1.5.0 when other is Series and method is "pearson" or "spearman"
# See https://github.com/pandas-dev/pandas/issues/48826
if LooseVersion(pd.__version__) >= LooseVersion("1.5.0") and isinstance(pobj, pd.Series):
methods = ["kendall"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why also skipping pearson ? it is also changed in pandas 1.5?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the regression is shown in both pearson and spearman 😿

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about also adding this link together with https://github.com/pandas-dev/pandas/issues/48826?
since https://github.com/pandas-dev/pandas/issues/48826 is only related to spearman

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! Let me add

@itholic
Copy link
Contributor Author

itholic commented Sep 30, 2022

Good catch! LGTM if the regression is confirmed from the pandas community.

Yes, it's confirmed by pandas-dev/pandas#48826

Copy link
Contributor

@zhengruifeng zhengruifeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise

@zhengruifeng
Copy link
Contributor

Merged into master, thank you @itholic for this fix.

@itholic
Copy link
Contributor Author

itholic commented Sep 30, 2022

Thanks for the review, @zhengruifeng and @xinrong-meng !

@itholic itholic deleted the SPARK-40589 branch April 22, 2023 05:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants