Skip to content

Merge duplicate contributors 👤👤👤👤 #447

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
o2sh opened this issue Jun 18, 2021 · 14 comments
Closed

Merge duplicate contributors 👤👤👤👤 #447

o2sh opened this issue Jun 18, 2021 · 14 comments
Assignees
Labels
discussion enhancement New feature or request help wanted Extra attention is needed

Comments

@o2sh
Copy link
Owner

o2sh commented Jun 18, 2021

Since @yoichi's PR #243: we identify each contributor by their email address instead of their username which is definitely the right solution. However in some cases, a user may have changed his GH email along the way and therefore appear twice - or more - in the Authors section:

Screenshot 2021-06-18 at 21 48 25

It may be a good idea to have a second pass in get_authors and merge the lines that share the same username.to_lowercase()

@o2sh o2sh added enhancement New feature or request discussion labels Jun 18, 2021
@o2sh o2sh changed the title Merge duplicate contributors Merge duplicate contributors 👤👤👤👤 Jun 18, 2021
@o2sh o2sh added the help wanted Extra attention is needed label Jun 24, 2021
@HallerPatrick
Copy link
Contributor

Sounds like a good idea to me.I would be interested in doing this. @o2sh is it still open?

@o2sh
Copy link
Owner Author

o2sh commented Jun 24, 2021

It's all yours 👍

@HallerPatrick
Copy link
Contributor

@o2sh While currently working on this, I am asking myself, why it is the better way to identify by email in the first place and not the username, when only the email can change and not the username? Or am I missing something?

@o2sh
Copy link
Owner Author

o2sh commented Jun 25, 2021

You can actually also change your GitHub username, even though it's less common.

The username/email that appear in the git log (and parsed by onefetch) are taken from the .gitconfig, so if your SSH key is properly configured: you can actually put whatever you want in their and still be able to push to your repository, - or in HTTPS, if you login with username/password. On GitHub, each commit is linked to an account based solely on its email adress(es).

I agree that merging based on the email seems arbitrary but so it is if we would have chosen the username: both give unsatisfactory results. Hence the second pass.

@HallerPatrick
Copy link
Contributor

Alrighty, that seems reasonable 👍

@HallerPatrick
Copy link
Contributor

Made a pull request. It is still not perfect, because there are also commits with a different username and email. We would have to make guesses, if they are the same person, where I would refrain from...

Screenshot 2021-06-26 at 15 02 28

@yoichi
Copy link
Contributor

yoichi commented Jun 26, 2021

Git has a mechanism for identifying users by .mailmap file. Is that not enough?
https://git-scm.com/docs/gitmailmap

@o2sh
Copy link
Owner Author

o2sh commented Jun 26, 2021

To be honest, I wasn't even aware that Git provided such a mechanism.

For repositories that do provide this file, this would definitely solve our issue. Git2 seems to provide bindings to take advantage of Mailmap.

Maybe, we should explore that solution, it may result in a cleaner logic and prevent us from making any wrong assumptions on whether or not an entry is a duplicate.

What do you think @HallerPatrick @yoichi @spenserblack


Some repositories using .mailmap: linux, react, tokei, zulip

@spenserblack
Copy link
Collaborator

Thank you for letting us know about that, @yoichi!

@o2sh I like this approach. Would the merging by author name still be necessary? Or should we merge only on email and leave the responsibility to the repo owner to add a .mailmap to let onefetch know how to merge users?

@o2sh
Copy link
Owner Author

o2sh commented Jun 26, 2021

Or should we merge only on email and leave the responsibility to the repo owner to add a .mailmap to let onefetch know how to merge users?

This I think, which implies reverting #449 and adding a Mailmap routine on top of what is currently in place.

@HallerPatrick
Copy link
Contributor

But isn't mailmapare features not used often? @o2sh mentioned some, but they are comparatively huge projects with a lot of commiters. Wouldn't it be better to check if a .mailmap file exists in root and use the routine and otherwise merge by mail?

@o2sh
Copy link
Owner Author

o2sh commented Jun 26, 2021

You're right @HallerPatrick , the presence of the .mailmap should drive the merging policy, with a fallback to what is currently in place - if no .mailmap is provided.

Even though, to be 100% correct: we shouldn't even be allowed to make the assumption that two commits made with the same email (and different username) corresponds to the same author.

For example, git shortlog relies on username + email in order to merge entries.

@HallerPatrick
Copy link
Contributor

@o2sh You have a point here with the 100% correctness. I mean what we could do is another command line flag which allows for assumptions to merge by email and also write in the docs, that maintainers of their repo could include a .mailmap file for a clean resolution of commits.

@o2sh
Copy link
Owner Author

o2sh commented Jun 27, 2021

...write in the docs, that maintainers of their repo could include a .mailmap file for a clean resolution of commits.

Ok for the docs, not so much about adding a command line flag.

I think we should stick to what git shortlog does - as the de facto standard - and merge entries based on username + email by default with .mailmap mapping if present.

Something in the line of the read_from_stdin function in shortlog.c that calls on parse_stdin_author for the .mailmap matching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants