Skip to content

unicode/utf8: RuneError should be a 4-byte value -- something uncreatable via DecodeRune's valid returns #47826

Closed
@BrannonKing

Description

@BrannonKing

What version of Go are you using (go version)?

1.16.6

Does this issue reproduce with the latest release?

I have not tried 1.17.x

What operating system and processor architecture are you using (go env)?

linux on amd64

What did you do?

I've been doing my own rune code folding (due to bugs in x/text/cases). I have this function:

func MyCaseFold(name []byte) []byte {
	var b bytes.Buffer
	b.Grow(len(name))
	for i := 0; i < len(name); {
		r, w := utf8.DecodeRune(name[i:])
		if r == utf8.RuneError && w < 2 {
			return name
		}
		replacements := foldMap[r]
		if len(replacements) > 0 {
			for j := range replacements {
				b.WriteRune(replacements[j])
			}
		} else {
			b.WriteRune(r)
		}
		i += w
	}
	return b.Bytes()
}

What did you expect to see?

I had expected that I could do the same with r := range string(name). However, I happened to have this (in hex) unicode string: 43efbfbd. The latter three bytes of that are valid utf-8 (and utf8.Valid agrees). It so happens that said string decodes to the same value as utf8.RuneError. I had to add that w < 2 check, which I determined after reading the DecodeRune source, where I saw that RuneError was only returned with 0 or 1 for the byte width.

utf8.RuneError should be a value that cannot be created via one of the non-error code-paths in DecodeRune.

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions