-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Why .sort_values() on column containing same values shuffles entire dataframe ? #39877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@artsheiko I think this behavior is intended for sort_values(Nothing is moved for col1 since its already sorted), and the entire row including index is swapped in the DataFrame during sorting. If you want to reset the index(which I think you're trying to do), you can add |
@lithomas1, |
What do you expect? |
The problem is that the result of sorting is shuffled randomly dataframe which is not so obvious for cases when we do not know in advance whether a column consists of one unique value or not. |
The documentation points to |
I just got burnt by this. I sorted my dataframe once on GPU Machine 1, then sorted it on CPU Machine 2, and found out that they have different indices! Now I have to write a workaround for my project. @attack68 Would you be interested if we wrote a warning that sort values is not reproducible across machines, or maybe adding a random_state param to ensure reproducibility? Or is it not worthwhile? It was certainly a surprise to me!!! |
df.sort_values('col2', ascending=True, kind='stable') will not disoder the same values' index |
Why this operation disorders the entire dataframe ? The same will be produced if the function will be applied for the col1.
The text was updated successfully, but these errors were encountered: