-
Notifications
You must be signed in to change notification settings - Fork 18.1k
encoding/json: unexpected result when json.Unmarshaling #38126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I've confirmed this is a duplicate of #38105. I'll close this issue to consolidate into the other one, but I'll add this as a test case. |
Change https://golang.org/cl/226218 mentions this issue: |
It seems that https://golang.org/cl/226218 has fix this issue. However, I still have a question. type T struct {
F1 string `json:"F1,string"`
}
t := T {
"aaa\tbbb",
}
b, _ := json.Marshal(t)
fmt.Println(string(b)) Before go 1.14, its output result is
Add after go 1.14, it changes to
It seems this change is introduced by this commit but may I ask why you change the encode behavior for string type struct field with ",string" option like this? Which of them excatly match the godoc's description?
|
You're right, @AllenX2018, something is up with this change in behavior. Can you please file a separate issue? I'm already investigating a fix, but it's good to have an issue to track the bug. |
I've created #38173 for track |
The added comment contains some context. The original optimization assumed that each call to unquoteBytes (or unquote) followed its corresponding call to rescanLiteral. Otherwise, unquoting a literal might use d.safeUnquote from another re-scanned literal. Unfortunately, this assumption is wrong. When decoding {"foo": "bar"} into a map[T]string where T implements TextUnmarshaler, the sequence of calls would be as follows: 1) rescanLiteral "foo" 2) unquoteBytes "foo" 3) rescanLiteral "bar" 4) unquoteBytes "foo" (for UnmarshalText) 5) unquoteBytes "bar" Note that the call to UnmarshalText happens in literalStore, which repeats the work to unquote the input string literal. But, since that happens after we've re-scanned "bar", we're using the wrong safeUnquote field value. In the added test case, the second string had a non-zero number of safe bytes, and the first string had none since it was all non-ASCII. Thus, "safely" unquoting a number of the first string's bytes could cut a rune in half, and thus mangle the runes. A rather simple fix, without a full revert, is to only allow one use of safeUnquote per call to unquoteBytes. Each call to rescanLiteral when we have a string is soon followed by a call to unquoteBytes, so it's no longer possible for us to use the wrong index. Also add a test case from #38126, which is the same underlying bug, but affecting the ",string" option. Before the fix, the test would fail, just like in the original two issues: --- FAIL: TestUnmarshalRescanLiteralMangledUnquote (0.00s) decode_test.go:2443: Key "开源" does not exist in map: map[开���:12345开源] decode_test.go:2458: Unmarshal unexpected error: json: invalid use of ,string struct tag, trying to unmarshal "\"aaa\tbbb\"" into string Fixes #38105. For #38126. Change-Id: I761e54924e9a971a4f9eaa70bbf72014bb1476e6 Reviewed-on: https://go-review.googlesource.com/c/go/+/226218 Run-TryBot: Daniel Martí <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Joe Tsai <[email protected]>
Change https://golang.org/cl/233057 mentions this issue: |
…e case when decoding The added comment contains some context. The original optimization assumed that each call to unquoteBytes (or unquote) followed its corresponding call to rescanLiteral. Otherwise, unquoting a literal might use d.safeUnquote from another re-scanned literal. Unfortunately, this assumption is wrong. When decoding {"foo": "bar"} into a map[T]string where T implements TextUnmarshaler, the sequence of calls would be as follows: 1) rescanLiteral "foo" 2) unquoteBytes "foo" 3) rescanLiteral "bar" 4) unquoteBytes "foo" (for UnmarshalText) 5) unquoteBytes "bar" Note that the call to UnmarshalText happens in literalStore, which repeats the work to unquote the input string literal. But, since that happens after we've re-scanned "bar", we're using the wrong safeUnquote field value. In the added test case, the second string had a non-zero number of safe bytes, and the first string had none since it was all non-ASCII. Thus, "safely" unquoting a number of the first string's bytes could cut a rune in half, and thus mangle the runes. A rather simple fix, without a full revert, is to only allow one use of safeUnquote per call to unquoteBytes. Each call to rescanLiteral when we have a string is soon followed by a call to unquoteBytes, so it's no longer possible for us to use the wrong index. Also add a test case from #38126, which is the same underlying bug, but affecting the ",string" option. Before the fix, the test would fail, just like in the original two issues: --- FAIL: TestUnmarshalRescanLiteralMangledUnquote (0.00s) decode_test.go:2443: Key "开源" does not exist in map: map[开���:12345开源] decode_test.go:2458: Unmarshal unexpected error: json: invalid use of ,string struct tag, trying to unmarshal "\"aaa\tbbb\"" into string Fixes #38106. For #38105. For #38126. Change-Id: I761e54924e9a971a4f9eaa70bbf72014bb1476e6 Reviewed-on: https://go-review.googlesource.com/c/go/+/226218 Run-TryBot: Daniel Martí <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Joe Tsai <[email protected]> (cherry picked from commit 55361a2) Reviewed-on: https://go-review.googlesource.com/c/go/+/233057 Run-TryBot: Dmitri Shuralyov <[email protected]> Reviewed-by: Daniel Martí <[email protected]>
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes, it reproduces with the latest release.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
My struct has a
string
field with ajson:"string"
field tag. When this field contains escaped character, I get unexcepted result when doing roundtrip test.What did you expect to see?
What did you see instead?
The text was updated successfully, but these errors were encountered: