Normalize oauth email username #28561

kdumontnu · 2023-12-20T15:15:58Z

I ran into a couple potential solutions here:

We could follow GitHub/Sourcegraph and replace any character outside of our supported character set with -.
- The problem here is that there are still some invalid characters that would result in "broken" usernames (eg. here matthias.schöpfer would become matthias.sch-pfer)
- This could silently hide mask some user configuration problems
- The change will be breaking if we ever change it (eg. we are safe to replace more characters, but we will break user linking if we change character replacement in the future).
We could do the above, but go one step further and remove diacritics before replacement
- This has the same downsides as above, but aims to solve one more reported issue
- Still doesn't handle some cases, like Ægidius -> -gidius
[Implemented] Take a more selective approach that can be expanded in the future. Create a set of known characters that should be removed, set of characters that should be replaced (with -), and a set of custom replacements (eg Æ -> AE)
- This doesn't have the downside of hiding potential user configuration errors
- It's safe to add to the set in the future (or even move to one of the more aggressive approaches above)

There doesn't seem to be a well-supported go package that does unicode -> ascii normalization. There are some other solutions, like this, that build a relatively complete replacement map. That would probably be more complete and somewhat more performant. However, it's also way overkill for us, and as I mentioned previously, once we replace a character, changing that will be a breaking change for users. However, if illegal characters continue to be an issue and we have to revisit this, we can use a solution like that.

docs/content/administration/config-cheat-sheet.en-us.md

models/user/user_test.go

routers/web/auth/auth.go

models/user/user.go

custom/conf/app.example.ini

wxiaoguang · 2023-12-20T15:50:44Z

models/user/user_test.go

+	testCases := [][]interface{}{
+		// input name, expected normalized name, is normalized name valid
+		{"test", "test", true},
+		{"Sinéad.O'Connor", "Sin-ad.OConnor", true}, // We should consider allowing custom replacement characters (eg. é -> e)


I guess it's necessary to do so .... maybe the Go library also has ability to do so.

Otherwise Sin-ad doesn't seem good for a username.

Another reason why it's necessary: maybe nobody likes making "breaking" changes again and again.

Think about a case: if this PR doesn't have a complete solution, in the future, if someone decides to "consider allowing replacement characters", what they could do? They would needs to introduce "option_v2", "option_v3" .... it would be a mess.

There was some discussion in discord about this too.

I'm not sure that we want to take on normalizing every possible user input the way that a user would want/expect. There are ~~hundreds~~ thousands of characters we'd have to transform from unicode to ascii. The most accessible method would be to extend this solution in the future with either another config setting, like NORMALIZE_CHARS, or add it directly to the auth source form.

However... if we want to take it on, there is a pretty good lookup table here we could transpose from javascript: https://web.archive.org/web/20120918093154/http://lehelk.com/2011/05/06/script-to-remove-diacritics/

I'm happy to go either way, but if we do the first method, we can follow up with a future, non-breaking PR.

I guess the problem with the first option is that admins could easily break user account external linking if they change the character replacements and then restart the server, right?

The unicode standard has defined it.

https://go.dev/blog/normalization

That article is specifically for characters that can be decomposed with diacritics. So, it should work for characters with accents (though the code they provide is out of date and needs to be modified), but is not a general-solution nor a complete "normalization" imo.

I think there is a good reason other software services don't do this (eg.): we are now making some assumptions about how users' names should be transformed, and it's only one step in that direction, which could potentially be endless. For example, Ægidius will still become -gidius, which will fail user creation.

After putting some thought into it I agree that we should the change you suggest, as it solves an additional user issue. However, if we choose to go this direction I suggest we do NOT replace unknown characters with -. The reasons being:

As you say, that could result in breaking changes in the future if we decide to replace additional characters. Instead, we selectively add characters into the replacement set safely.

This solves all of the open issues, while providing significant, incremental improvement

It's less likely that we are silently hiding a user configuration error

* giteaofficial/main: Normalize oauth email username (go-gitea#28561) Fix wrapping of label list (go-gitea#28684) Fix grammar in `actions.variables.id_not_exist` (en-US) (go-gitea#28680) Fix grammar issues on the repository Actions page (en-US) (go-gitea#28679) Fix tooltip of variable edit button (go-gitea#28681) Make cross-reference issue links work in markdown documents again (go-gitea#28682)

GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Dec 20, 2023

pull-request-size bot added the size/M label Dec 20, 2023

github-actions bot added the modifies/docs label Dec 20, 2023

jolheiser requested changes Dec 20, 2023

View reviewed changes

docs/content/administration/config-cheat-sheet.en-us.md Outdated Show resolved Hide resolved

models/user/user_test.go Outdated Show resolved Hide resolved

models/user/user_test.go Outdated Show resolved Hide resolved

GiteaBot added lgtm/blocked A maintainer has reservations with the PR and thus it cannot be merged and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Dec 20, 2023

delvh reviewed Dec 20, 2023

View reviewed changes

routers/web/auth/auth.go Outdated Show resolved Hide resolved

routers/web/auth/auth.go Outdated Show resolved Hide resolved

models/user/user.go Outdated Show resolved Hide resolved

wxiaoguang reviewed Dec 20, 2023

View reviewed changes

custom/conf/app.example.ini Outdated Show resolved Hide resolved

wxiaoguang reviewed Dec 20, 2023

View reviewed changes

kdumontnu added 9 commits December 31, 2023 13:12

Add new setting for email normalization

aaffc69

Add normalization option for Oauth2 email

3c47ed0

Update docs

974a5be

Code review suggestions

a41bd17

Remove diacritics

8bf5e68

Add developer warning

d703297

Clean up

e947e57

Move to opt-in character replacement

e989db0

Remove option for normalization

6ecdea1

kdumontnu force-pushed the kd/normalize_oauth_email_username branch 2 times, most recently from 001a679 to 39cd778 Compare December 31, 2023 23:44

kdumontnu added 3 commits December 31, 2023 18:45

Add support for additional replacement characers

39cd778

Fix messages

4953e29

Merge branch 'main' into kd/normalize_oauth_email_username

1902f6b

kdumontnu requested a review from jolheiser January 2, 2024 14:54

wxiaoguang approved these changes Jan 2, 2024

View reviewed changes

kdumontnu added 2 commits January 2, 2024 14:42

Merge branch 'main' into kd/normalize_oauth_email_username

ac47027

Merge branch 'main' into kd/normalize_oauth_email_username

04d9d80

delvh approved these changes Jan 3, 2024

View reviewed changes

jolheiser approved these changes Jan 4, 2024

View reviewed changes

GiteaBot removed the lgtm/blocked A maintainer has reservations with the PR and thus it cannot be merged label Jan 4, 2024

GiteaBot added the lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. label Jan 4, 2024

jolheiser merged commit 54acf7b into go-gitea:main Jan 4, 2024

GiteaBot added this to the 1.22.0 milestone Jan 4, 2024

kdumontnu deleted the kd/normalize_oauth_email_username branch January 4, 2024 01:36

fuxiaohei pushed a commit to fuxiaohei/gitea that referenced this pull request Jan 17, 2024

Normalize oauth email username (go-gitea#28561)

1a5efa3

silverwind pushed a commit to silverwind/gitea that referenced this pull request Feb 20, 2024

Normalize oauth email username (go-gitea#28561)

eb3f42e

github-actions bot locked as resolved and limited conversation to collaborators Mar 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Normalize oauth email username #28561

Normalize oauth email username #28561

Uh oh!

kdumontnu commented Dec 20, 2023 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wxiaoguang Dec 20, 2023

Uh oh!

wxiaoguang Dec 20, 2023 •

edited

Loading

Uh oh!

kdumontnu Dec 20, 2023 •

edited

Loading

Uh oh!

wxiaoguang Dec 21, 2023

Uh oh!

kdumontnu Dec 31, 2023 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Normalize oauth email username #28561

Normalize oauth email username #28561

Uh oh!

Conversation

kdumontnu commented Dec 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wxiaoguang Dec 20, 2023

Choose a reason for hiding this comment

Uh oh!

wxiaoguang Dec 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kdumontnu Dec 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wxiaoguang Dec 21, 2023

Choose a reason for hiding this comment

Uh oh!

kdumontnu Dec 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kdumontnu commented Dec 20, 2023 •

edited

Loading

wxiaoguang Dec 20, 2023 •

edited

Loading

kdumontnu Dec 20, 2023 •

edited

Loading

kdumontnu Dec 31, 2023 •

edited

Loading