-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-38800][DOCS][PYTHON] Explicitly document the supported pandas version. #36095
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-38800][DOCS][PYTHON] Explicitly document the supported pandas version. #36095
Conversation
python/docs/source/index.rst
Outdated
@@ -47,6 +47,8 @@ With this package, you can: | |||
* Have a single codebase that works both with pandas (tests, smaller datasets) and with Spark (distributed datasets). | |||
* Switch to pandas API and PySpark API contexts easily without any overhead. | |||
|
|||
Note that pandas has different behavior per its version, and pandas API on Spark tries to match the behavior of pandas 1.4. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add this in https://spark.apache.org/docs/latest/api/python/reference/index.html page. Also pandas 1.4.
-> latest pandas release.
We should specify the version only in other branches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, then can I create other PRs for branch-3.2
and branch-3.3
separately ?
I believe that branch-3.2
supports pandas 1.3 and branch-3.3
supports pandas 1.4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, there are some test failed with pandas 1.4.x, I haven't got enough time to find all errors reason yet, some of thems are some panda bugs in 1.4.x (such as pandas-dev/pandas#46589). we only tested 1.3.x with github action ci.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created SPARK-38819 to track the issue on Run Pandas on Spark with Pandas 1.4.x
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yikun Thanks for the testing!! Let me help taking a look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on master should doc support version to panda latest.
I use Gmail to check English. If you start a new email in Gmail and you paste the sentence in, you will see that a "the" is missing. Note that pandas has different behavior per its version, and pandas API on Spark tries to match the behavior of the latest pandas release. |
Thanks, @bjornjorgensen for the tip 👍 Just corrected. |
Merged to master. |
@itholic mind creating a PR for branch-3.3? |
…ndas version ### What changes were proposed in this pull request? This PR proposes to document the supported pandas version for pandas API on Spark. #36095 is corresponding PR for master branch. ### Why are the changes needed? Since the behavior of pandas is different per its version, it would be better explicitly documenting the supported pandas version so that users won't confuse. pandas API on Spark aims matching behavior to pandas 1.3. ### Does this PR introduce _any_ user-facing change? Yes, now the supported pandas version is mentioned to the PySpark API reference page. ### How was this patch tested? This existing doc build should cover Closes #36114 from itholic/SPARK-38800-3.3. Authored-by: itholic <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This PR proposes to document the supported pandas version for pandas API on Spark.
Why are the changes needed?
Since the behavior of pandas is different per its version, it would be better explicitly documenting the supported pandas version so that users won't confuse.
Does this PR introduce any user-facing change?
Yes, now the supported pandas version is mentioned to the PySpark API reference page as below:
How was this patch tested?
This existing doc build should cover