-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-40589][PS][TEST] Fix test for DataFrame.corr_with
skip the pandas regression
#38031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…andas regression
@@ -6076,7 +6076,13 @@ def test_corrwith(self): | |||
def _test_corrwith(self, psdf, psobj): | |||
pdf = psdf.to_pandas() | |||
pobj = psobj.to_pandas() | |||
for method in ["pearson", "spearman", "kendall"]: | |||
# Regression in pandas 1.5.0 when other is Series and method is "pearson" or "spearman" | |||
# See https://github.com/pandas-dev/pandas/issues/48826 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: This is the report to the pandas community to confirm if it officially regression in pandas.
Good catch! LGTM if the regression is confirmed from the pandas community. |
# Regression in pandas 1.5.0 when other is Series and method is "pearson" or "spearman" | ||
# See https://github.com/pandas-dev/pandas/issues/48826 | ||
if LooseVersion(pd.__version__) >= LooseVersion("1.5.0") and isinstance(pobj, pd.Series): | ||
methods = ["kendall"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why also skipping pearson
? it is also changed in pandas 1.5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the regression is shown in both pearson
and spearman
😿
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about also adding this link together with https://github.com/pandas-dev/pandas/issues/48826
?
since https://github.com/pandas-dev/pandas/issues/48826
is only related to spearman
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! Let me add
Yes, it's confirmed by pandas-dev/pandas#48826 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise
Merged into master, thank you @itholic for this fix. |
Thanks for the review, @zhengruifeng and @xinrong-meng ! |
What changes were proposed in this pull request?
This PR proposes to skip the
DataFrame.corr_with
test when theother
ispyspark.pandas.Series
and themethod
is "spearman" or "pearson", since there is regression in pandas 1.5.0 for that cases.Why are the changes needed?
There are some regressions in pandas 1.5.0, so we're not going to match the behavior for those cases.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Manually tested with pandas 1.5.0, confirmed the test pass.