-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Closed
dotnet/corefx
#39524Labels
api-approvedAPI was approved in API review, it can be implementedAPI was approved in API review, it can be implementedarea-System.Text.Json
Milestone
Description
Motivation:
-
If the user wants to write string literals (or known strings), checking to escape is an unnecessary expense.
- The previous solution (to provide writer method overloads that takes an
escape
bool) allowed callers to skip validation, and that meant embedded quotes or invalid UTF-8 could be written, in attempts to get performance - A benchmark like this regresses 10-15% without having the ability to skip escaping: https://github.com/dotnet/performance/blob/49d604c4b4f637b286f37f1eaa696c4d5785edfe/src/benchmarks/micro/corefx/System.Text.Json/Utf8JsonWriter/Perf.Basic.cs#L52-L83
- Today, for best performance, users have to store the UTF-16 string literals in static UTF-8 encoded
byte[]
to avoid the transcoding step. See https://github.com/aspnet/AspNetCore/blob/56c064bd53ed064d82ffb0b5c4c776da87c02326/src/SignalR/common/SignalR.Common/src/Protocol/HandshakeProtocol.cs#L21-L65
- The previous solution (to provide writer method overloads that takes an
-
Today, for defense in depth, the writer over-escapes by default. This includes the characters that is required to be escaped according to the JSON spec, but also all non-ASCII characters, and some within the ASCII range as well (
+
,&
, etc.).- That might be too aggressive for some scenarios. For example, users may want to opt-out of non-ASCII escaping
- Aside from human readability (which is primarily cosmetic), over-escaping does end up affecting the payload size (since it could be a 2-6x expansion depending on the payload)
- See https://github.com/dotnet/corefx/issues/35385 / https://github.com/dotnet/corefx/issues/34958
API Proposal:
namespace System.Text.Json
{
// JsonEncodedString? JsonEscapedString/Text/Value?
public sealed class JsonEncodedText
{
// Solves motivation 1: Default escaping (internally uses equivalent to JavaScriptEncoder.Default)
public JsonEncodedText(string value);
public JsonEncodedText(ReadOnlySpan<char> value);
public JsonEncodedText(ReadOnlySpan<byte> utf8Value);
// Solves motivation 2: Custom escaping via JavaScriptEncoder
// OR optional encoder parameter where null means default?
public JsonEncodedText(string value, JavaScriptEncoder encoder);
public JsonEncodedText(ReadOnlySpan<char> value, JavaScriptEncoder encoder);
public JsonEncodedText(ReadOnlySpan<byte> utf8Value, JavaScriptEncoder encoder);
public byte[] EncodedUtf8String { get ; } // OR ReadOnlySpan<byte>?
public string EncodedString { get ; } // OR ReadOnlySpan<char>?
public JavaScriptEncoder Encoder { get ; }
public bool IsAlreadyEncoded(JavaScriptEncoder encoder);
public override string ToString();
}
public partial struct JsonWriterOptions
{
// Exists:
public bool Indented { get; set { } }
public bool SkipValidation { get; set { } }
// New:
public JavaScriptEncoder Encoder { get; set { } }
}
public sealed partial class Utf8JsonWriter : System.IDisposable
{
// ...
// other overloads
// ...
public void WriteStringValue(string value) { }
// All overloads that take string get one that takes JsonEncodedText
public void WriteStringValue(JsonEncodedText value) { }
public void WriteString(JsonEncodedText propertyName, DateTime value) { }
public void WriteString(JsonEncodedText propertyName, DateTimeOffset value) { }
public void WriteString(JsonEncodedText propertyName, Guid value) { }
public void WriteString(string propertyName, JsonEncodedText value) { }
public void WriteString(JsonEncodedText propertyName, ReadOnlySpan<byte> utf8Value) { }
public void WriteString(JsonEncodedText propertyName, ReadOnlySpan<char> value) { }
public void WriteString(JsonEncodedText propertyName, string value) { }
public void WriteString(JsonEncodedText propertyName, JsonEncodedText value) { }
public void WriteString(ReadOnlySpan<char> propertyName, JsonEncodedText value) { }
public void WriteString(ReadOnlySpan<byte> utf8PropertyName, JsonEncodedText value) { }
public void WriteStartObject(JsonEncodedText propertyName) { }
public void WriteStartArray(JsonEncodedText propertyName) { }
public void WriteBoolean(JsonEncodedText propertyName, bool value) { }
public void WriteNull(JsonEncodedText propertyName) { }
public void WriteNumber(JsonEncodedText propertyName, decimal value) { }
public void WriteNumber(JsonEncodedText propertyName, double value) { }
public void WriteNumber(JsonEncodedText propertyName, int value) { }
public void WriteNumber(JsonEncodedText propertyName, long value) { }
public void WriteNumber(JsonEncodedText propertyName, float value) { }
[System.CLSCompliantAttribute(false)]
public void WriteNumber(JsonEncodedText propertyName, uint value) { }
[System.CLSCompliantAttribute(false)]
public void WriteNumber(JsonEncodedText propertyName, ulong value) { }
}
}
Sample usage within JsonWriter:
public void WriteStringValue(JsonEncodedText value)
{
// skip escaping check, skip escaping
if (value.IsAlreadyEncoded(Options.Encoder))
{
WriteStringValueHelper(value.EncodedUtf8String);
}
else
{
// Existing UTF-8 write string API, which checks for escaping/etc.
WriteStringValue(value.EncodedUtf8String);
}
}
Sample user code:
// Custom encoder optional
private static readonly JsonEncodedText FirstJson = new JsonEncodedText("first");
private static readonly JsonEncodedText LastJson = new JsonEncodedText("last");
private static readonly JsonEncodedText AgeJson = new JsonEncodedText("age", JavaScriptEncoder.Default);
public void WriteBasicJsonEncoded(ArrayBufferWriter<byte> output)
{
// Default encoder is `JavaScriptEncoder.Default`
using (var json = new Utf8JsonWriter(output, new JsonWriterOptions { Indented = false, SkipValidation = true, Encoder = JavaScriptEncoder.Default }))
{
json.WriteStartObject();
json.WriteNumber(AgeJson, 42);
json.WriteString(FirstJson, "John");
json.WriteString(LastJson, "Smith");
json.WriteEndObject();
json.Flush();
}
}
Questions:
- Should
JsonEncodedText
override the encoder passed in to theUtf8JsonWriter
viaJsonWriterOptions
or should there be no escaping?- If the
Utf8JsonWriter
always wins, we don't need theIsAlreadyEncoded
. - struct or class?
- If the
- Do we want optional ctor arguments that take
JavaScriptEncoder
or just overloads? - Should we return
ROS<char>
/ROS<byte>
or regularstring
/byte[]
? - Type names? JsonEncodedString? JsonEscapedString/Text/Value?
- Should this new
JsonWriterOption
be exposed via the serializer? - Should this implement
IEquatable<JsonEncodedText>
?
Notes:
- This proposal depends on
TextEncoder
/JavaScriptEncoder
being in-box so thatSystem.Text.Json
can reference it. - It also depends on new UTF-8 specific functionality on the
JavaScriptEncoder
: https://github.com/dotnet/corefx/issues/34830 / https://github.com/dotnet/corefx/issues/33509 - Here's a prototype implementation: dotnet/corefx@master...ahsonkhan:JsonEncodedString
cc @ycrumeyrolle, @ccnxpt, @GrabYourPitchforks, @steveharter, @bartonjs, @BrennanConroy, @JamesNK, @rynowak, @KrzysztofCwalina, @stephentoub
petrsvihlik
Metadata
Metadata
Assignees
Labels
api-approvedAPI was approved in API review, it can be implementedAPI was approved in API review, it can be implementedarea-System.Text.Json