-
Notifications
You must be signed in to change notification settings - Fork 18k
proposal: unicode/utf16: add DecodeRuneBytes #65511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I've just noticed that I'd also greatly benefit from adding a func EncodeRuneBytes(dst []byte, r rune) int |
The DecodeRuneBytes function adds a new concept to the utf16 package, namely the encoding of sequences of UTF-16 codes as byte strings, with the concomitant need to specify the byte order. If we make the byte encoding the caller's concern, then can't the problem be solved with code something like this? a := next()
if !utf16.IsSurrogate(a) {
return a
}
b := next()
if !utf16.IsSurrogate(b) {
return 0xFFFD
}
return utf16.DecodeRune(a, b) (Replace 'next' with your favorite byte iterator.) |
@adonovan Hmm, rune |
DecodeRune applies the correct range check to each surrogate, so my example will return the correct rune; however, when it returns U+FFFD it may consume too much: if 'a' was invalid it should not consume 'b'. To fix that we would need not only IsHighSurrogate (that's the usual term for "second") for 'b', but also IsLowSurrogate for 'a'. |
DecodeRuneBytes
DecodeRuneBytes
CC @dsnet |
The utf16 package is meant to be as minimal as possible. If you really need to decode bytes, it is easy to decode them outside the loop and then call DecodeRune with successive pairs of uint16 values. |
Also we probably do not want utf16 to import encoding/binary and then depend recursively on many other packages, including reflect. Right now utf16 has no imports at all, and it is important to keep it that way for use on Windows in package syscall. |
This proposal has been declined as infeasible. |
DecodeRuneBytes
Proposal Details
Hello peeps. I'd like to propose to extract part of the existing code in the encoding/utf16 package into a function.
In particular these lines: https://github.com/golang/go/blob/master/src/unicode/utf16/utf16.go#L116-L130
Why?
On existing
utf16.DecodeRune
utf16.DecodeRune
already exists but its usage is tricky with existing API. One can't use it in the same way as the well designedutf8.DecodeRune
sinceutf16.DecodeRune
expects one to have already finished the step of deciding whether the first rune in the[]uint16
is a single "self" or a surrogate pair- and there's no API that helps one make this decision!It is my understanding that
utf16.DecodeRune
not having the following signature was a missed opportunity to solving the issues I'm currently having:How?
Adding a package level function. Unverified and untested function logic included for visual purposes.
Note: Here's an example of the API implemented in a package and it's usage.
The text was updated successfully, but these errors were encountered: