-
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
Normalize oauth email username #28561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalize oauth email username #28561
Conversation
models/user/user_test.go
Outdated
testCases := [][]interface{}{ | ||
// input name, expected normalized name, is normalized name valid | ||
{"test", "test", true}, | ||
{"Sinéad.O'Connor", "Sin-ad.OConnor", true}, // We should consider allowing custom replacement characters (eg. é -> e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's necessary to do so .... maybe the Go library also has ability to do so.
Otherwise Sin-ad
doesn't seem good for a username.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another reason why it's necessary: maybe nobody likes making "breaking" changes again and again.
Think about a case: if this PR doesn't have a complete solution, in the future, if someone decides to "consider allowing replacement characters", what they could do? They would needs to introduce "option_v2", "option_v3" .... it would be a mess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was some discussion in discord about this too.
I'm not sure that we want to take on normalizing every possible user input the way that a user would want/expect. There are hundreds thousands of characters we'd have to transform from unicode to ascii. The most accessible method would be to extend this solution in the future with either another config setting, like NORMALIZE_CHARS
, or add it directly to the auth source form.
However... if we want to take it on, there is a pretty good lookup table here we could transpose from javascript: https://web.archive.org/web/20120918093154/http://lehelk.com/2011/05/06/script-to-remove-diacritics/
I'm happy to go either way, but if we do the first method, we can follow up with a future, non-breaking PR.
I guess the problem with the first option is that admins could easily break user account external linking if they change the character replacements and then restart the server, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unicode standard has defined it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That article is specifically for characters that can be decomposed with diacritics. So, it should work for characters with accents (though the code they provide is out of date and needs to be modified), but is not a general-solution nor a complete "normalization" imo.
I think there is a good reason other software services don't do this (eg.): we are now making some assumptions about how users' names should be transformed, and it's only one step in that direction, which could potentially be endless. For example, Ægidius
will still become -gidius
, which will fail user creation.
After putting some thought into it I agree that we should the change you suggest, as it solves an additional user issue. However, if we choose to go this direction I suggest we do NOT replace unknown characters with -
. The reasons being:
- As you say, that could result in breaking changes in the future if we decide to replace additional characters. Instead, we selectively add characters into the replacement set safely.
- This solves all of the open issues, while providing significant, incremental improvement
- It's less likely that we are silently hiding a user configuration error
001a679
to
39cd778
Compare
* giteaofficial/main: Normalize oauth email username (go-gitea#28561) Fix wrapping of label list (go-gitea#28684) Fix grammar in `actions.variables.id_not_exist` (en-US) (go-gitea#28680) Fix grammar issues on the repository Actions page (en-US) (go-gitea#28679) Fix tooltip of variable edit button (go-gitea#28681) Make cross-reference issue links work in markdown documents again (go-gitea#28682)
Closes #28461
I ran into a couple potential solutions here:
-
.matthias.schöpfer
would becomematthias.sch-pfer
)Ægidius
->-gidius
-
), and a set of custom replacements (egÆ
->AE
)There doesn't seem to be a well-supported go package that does unicode -> ascii normalization. There are some other solutions, like this, that build a relatively complete replacement map. That would probably be more complete and somewhat more performant. However, it's also way overkill for us, and as I mentioned previously, once we replace a character, changing that will be a breaking change for users. However, if illegal characters continue to be an issue and we have to revisit this, we can use a solution like that.