-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DOC: Key order after dataframe inner merge #53157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@lucdem Could you please provide the issue with a name? |
I'd say "order the results based on first occurrence the left key " is more correct than "sort the results based on the left key" , see example below: import pandas as pd
df = pd.DataFrame({
'n': [2, 1, 3, 1, 2, 3], # changed order
'i': [0, 1, 2, 3, 4, 5]
})
df2 = pd.DataFrame({
'n': [1, 2, 3],
'str': ['1', '2', '3']
})
print(df.merge(df2, on='n', how='inner'))
print('--------')
print(df.merge(df2, on='n', how='left')) with the result now being:
Note that 2 comes before 1 in the inner example. If you're got an improvement to the current wording, you're welcome to submit a PR. |
BTW, I think the current wording is actually correct, but maybe can be clearer. I think I'll close this issue unless you can show an error in the docs, but an improved doc string is still welcome. |
But the order of the key is not preserved at all, the keys are sorted. In this example the order of
It would even be better not to mention it, because now it's mentioned wrong. |
That did not happen in your original post though, this is different. In the original post the keys were ordered by first appearance. I can see that in the new example the result is sorted, which is not right. So the issue is that "Key order after dataframe inner merge on index" (but not columns) and this is a bug. Do you agree? |
Can I work on improving the explanation/documentation provided under pandas.merge() ? |
Pandas version checks
main
hereLocation of the documentation
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html
Documentation problem
The documentation for the 'how' parameters says:
From this description you could expect that the result of
df1.merge(df2, on='column_name', how='inner')
anddf1.merge(df2, on='column_name', how='left')
would both maintain the same order, however that's not what happens.Example below, tested on version 2.0.1:
Output:
Suggested fix for documentation
Either clarify that the merge operation will sort the results based on the left key when using 'inner' for the 'how' parameter, rather than "preserve the order", or explicitly state that the operation does not guarantee any order.
The text was updated successfully, but these errors were encountered: