Skip to content

proposal: encoding/json: avoid massive escape costs #68203

Closed
@karalabe

Description

@karalabe

Proposal Details

It seems the json encoder and decoder has a significant overhead when escaping strings. I've attached a bunch of benchmarks to the end of this report. In short, I have a large string (hex in this case) and I would like to insert it into a json field. My benchmarks just json encode the single hex value.

BenchmarkMarshalString-12             	     162	   7383418 ns/op
BenchmarkMarshalRawJSON-12            	      42	  28148749 ns/op
BenchmarkMarshalTexter-12             	     153	   7785682 ns/op
BenchmarkMarshalJsoner-12             	      40	  28960272 ns/op
BenchmarkMarshalCopyString-12         	    4141	    263625 ns/op

I would expect the performance to be near the speed of copying the data. However, Go seems to do a lot of extra processing. This report is kind of questioning various parts of that:

  • I can imagine Go wanting to double check the content of a string, but in that case, it would be nice to have a means to tell the json encoder/decoder that I know the content is valid, just parse it without wasting a ton of time.
  • I expected the RawMessage to actually not do all kinds of pre-post processing, but alas, Go elegantly ignores that it's "raw", and still does everything.
  • Annoyingly enough, for types that have MarshalJson implemented, it seems the escaping runs 3 (!!!) times. I haven;t found the 3rd one, but I think two of them are https://github.com/golang/go/blob/master/src/encoding/json/encode.go#L587 and the line right after, where both lines do an appendString call, which internally does the escape checks (yeah, the noescape flag only disables HTML escape checking, not ascii escape checking).

I'm not even entirely sure what's the solution to the various issues.

  • I'd expect to be able to use the json package without escaping.
  • I'd expect RawMessage to not be post processed
  • I'd expect the escaping code to be fast, and not take more time than encoding all the fields
  • I'd expect encoding to run once, not 3 times
func BenchmarkMarshalString(b *testing.B) {
	src := bytes.Repeat([]byte{'0'}, 4194304)
	str := hex.EncodeToString(src)

	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		json.Marshal(str)
	}
}

func BenchmarkMarshalRawJSON(b *testing.B) {
	src := bytes.Repeat([]byte{'0'}, 4194304)
	msg := json.RawMessage(`"` + hex.EncodeToString(src) + `"`)

	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		json.Marshal(msg)
	}
}

func BenchmarkMarshalTexter(b *testing.B) {
	src := bytes.Repeat([]byte{'0'}, 4194304)
	txt := &Texter{str: hex.EncodeToString(src)}

	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		json.Marshal(txt)
	}
}

func BenchmarkMarshalJsoner(b *testing.B) {
	src := bytes.Repeat([]byte{'0'}, 4194304)
	jsn := &Jsoner{str: hex.EncodeToString(src)}

	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		json.Marshal(jsn)
	}
}

func BenchmarkMarshalCopyString(b *testing.B) {
	src := bytes.Repeat([]byte{'0'}, 4194304)
	str := hex.EncodeToString(src)

	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		buf := make([]byte, len(str)+2)
		buf[0] = '"'
		copy(buf[1:], str)
		buf[len(buf)-1] = '"'
	}
}

type Texter struct {
	str string
}

func (t Texter) MarshalText() ([]byte, error) {
	return []byte(t.str), nil
}

type Jsoner struct {
	str string
}

func (j Jsoner) MarshalJSON() ([]byte, error) {
	return []byte(`"` + j.str + `"`), nil
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions