Skip to content

regexp.ReplaceAllString is removing the accents of a normalized string #46108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
esdrasbeleza opened this issue May 11, 2021 · 1 comment
Closed

Comments

@esdrasbeleza
Copy link

esdrasbeleza commented May 11, 2021

What version of Go are you using (go version)?

$ go version
go1.16.3 darwin/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/esdrasbeleza/Library/Caches/go-build"
GOENV="/Users/esdrasbeleza/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/esdrasbeleza/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/esdrasbeleza/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.16.3/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.16.3/libexec/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.16.3"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/esdrasbeleza/dev/array/monorepo/golang/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/v_/4ltj2w9n2l525tl1n_shy5vm0000gn/T/go-build3161634107=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I'm implementing a solution that requires me to use a regexp to remove some special chars from a string.
The string is supposed to be normalized before the removal happens.
I was able to reproduce it in playground.

What did you expect to see?

My regexp operation is supposed to replace all characters that are not an Unicode letter, number, dash, underline or space with an empty string, like Bákè123 -_.

What did you see instead?

ReplaceAllString is also replacing my letters with accents with the same letters without accents, so I see Bake123 -_.

@ianlancetaylor
Copy link
Contributor

The regexp package always works in terms of UTF-8 encoded code points. It doesn't work in terms of normalized strings. It doesn't support the kind of operation that you need. Sorry.

@golang golang locked and limited conversation to collaborators May 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants