Skip to content

Expose Unescape + AppendUnescape helper functions #62

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 18, 2021

Conversation

lab176
Copy link
Contributor

@lab176 lab176 commented Feb 16, 2021

Similar to #60, this pr introduces and exposes Unescape() and AppendUnescape helper functions.
These simply mirror and inverse the recently exposed Escape and Unescape functions.

@lab176 lab176 changed the title Expose Unescape + AppendUnescape helper functions wip: Expose Unescape + AppendUnescape helper functions Feb 16, 2021
@lab176 lab176 changed the title wip: Expose Unescape + AppendUnescape helper functions Expose Unescape + AppendUnescape helper functions Feb 16, 2021
json/json.go Outdated
// AppendUnescape appends s to b with the string unescaped as a JSON value.
// This will include the starting and ending quote characters, and the
// appropriate characters will be escaped correctly for JSON encoding.
func AppendUnescape(b []byte, s string, flags ParseFlags) []byte {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should s be of type []byte ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're fast..
Likely.. it's awkward here but I figured i'd keep the same pattern as the above Escape and AppendEscape functionality. Open to reversing it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I'm suggesting it is because Escape returns a []byte, so it's a bit strange to have the reverse function work with a string. Besides the inconsistency, we'd end up forcing a type conversion on the program if it needs to escape and unescape.

This would also remove the need to do the string-to-bytes conversion in your implementation, which requires a heap allocation and copy of the content.

@lab176 lab176 marked this pull request as ready for review February 17, 2021 15:52
@lab176
Copy link
Contributor Author

lab176 commented Feb 17, 2021

It seems I don't have access to snyk to investigate these failing tests, will look around for what these are doing.

update: seems like they resolved themselves

@lab176 lab176 requested a review from chriso February 17, 2021 17:33
json/json.go Outdated
// appropriate characters will be escaped correctly as if JSON decoded.
func AppendUnescape(b []byte, s []byte, flags ParseFlags) []byte {
d := decoder{flags: flags}
b, _ = d.decodeString(s, unsafe.Pointer(&s))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should one of the arguments to decodeString be b? I'm not seeing how the storage allocated for b would actually be written to if we're not pushing it down the call stack. 🤔

json/json.go Outdated
func AppendUnescape(b []byte, s []byte, flags ParseFlags) []byte {
d := decoder{flags: flags}
b, _ = d.decodeString(s, unsafe.Pointer(&s))
return s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, thanks @stevevls for your call out.

This looks wrong doesn't it? Assigning b and returning s?

@lab176 lab176 force-pushed the lbaier/expose-unescape-helpers branch from 82f8e22 to a76b69a Compare February 17, 2021 21:50
b, _ = d.decodeString(s, unsafe.Pointer(&s))
return s
buf := new(string)
d.decodeString(s, unsafe.Pointer(buf))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super familiar with this code, so bear with me here. 😁 Is it possible to decode straight into b to avoid allocating to buf and then copying?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unlike encodeString(), decodeString() does not append the result string to the buffer passed it, rather it simply overwrites the string at the pointer passed in:
*(*string)(p) = string(s), where p is the passed in buffer. I'd like to avoid modifying the underlying function for this simple change, but perhaps there are options I am be missing!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe using decodeString isn't the right approach, have you looked into parseStringUnquote ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you'd have to reimplement the handling of parsing flag, maybe that's not the right path.

Copy link
Contributor

@achille-roussel achille-roussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we could add a benchmark to validate that there are benefits to using json.AppendUnescape compared to json.Unmarshal ?

@lab176
Copy link
Contributor Author

lab176 commented Feb 17, 2021

Do you think we could add a benchmark to validate that there are benefits to using json.AppendUnescape compared to json.Unmarshal ?

Yeah I'll do this, good call.
fwiw, our specific use case will not use AppendUnescape- we simply need to unescape individual fields pulled from the json object, as the optimization is specifically to avoid processing the entire object. I included it to pattern match the exposed Escape functions. I'll add the benchmark, but perhaps it can be avoided until its necessary.

@achille-roussel
Copy link
Contributor

Often times, these can be a good enough approaches:

var v struct {
  Field string
}

json.Unmarshal(b, &v) // will only unmarshal `Field` from a large JSON payload
var s = []byte(`"hello world!\n"`)
var v string

json.Unmarshal(s, &v) // this will unescape the JSON string `s`

@lab176
Copy link
Contributor Author

lab176 commented Feb 18, 2021

benchmark results:

BenchmarkUnmarshalField-12    	 2908604	       425 ns/op	     256 B/op	       3 allocs/op
BenchmarkUnescape-12    	 4048627	       290 ns/op	      64 B/op	       2 allocs/op

@lab176 lab176 merged commit 496d327 into master Feb 18, 2021
@lab176 lab176 deleted the lbaier/expose-unescape-helpers branch February 18, 2021 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants