Skip to content

Use Utf8Json as the internal serializer #3493

Closed
@russcam

Description

@russcam

This issue has been opened to discuss moving the internal serializer from Json.NET over to a faster JSON serialization library.

The feature/utf8json-serializer branch contains a minimal viable prototype of deserializing an ISearchResponse<T> and serializing ISearchRequest.

Some key observations working with utf8json whilst putting together this prototype:

  1. Hit<T> requires a custom formatter to be resolved at the IJsonFormatterResolver level because it contains a generic type property whose formatter, SourceFormatter<T>, cannot be resolved using JsonFormatterAttribute. If it were possible to resolve, then it would be possible to attribute Hit<T> with [JsonFormatter(typeof(HitFormatter<>))], and have the _source field attributed with [JsonFormatter(typeof(SourceFormatter<>))]. For now, initialize an instance of SourceFormatter<T> inside the HitFormatter<T> constructor.

  2. Implementation does not handle different field casings

  3. HitFormatter<T> avoids allocating strings when reading property names by using AutomataDictionary. This dictionary lives outside of the generic HitFormatter<T> to avoid creating an instance of the dictionary for each T.

  4. Both JsonReader and JsonWriter are structs passed by ref, so cannot be captured inside of local
    functions or lambda expression bodies, but instead would need to be passed as a ref parameter to a function. An example is JoinFieldFormatter's Serialize method.

  5. utf8json does not have a similar concept to [JsonObject(MemberSerialization.OptIn)] to
    only serialize those members that have been explicitly attributed with DataMemberAttribute.
    This is something that would ideally be needed as it is cumbersome to set [IgnoreDataMember]
    on all properties that should be ignored.

  6. ConnectionSettings is retrieved by casting IJsonFormatterResolver to a known concrete
    implementation that exposes it as a property. Not ideal, but it works.

  7. utf8json does not make a distinction between an integer token and a float token as Json.NET
    does. This is not so much of a problem, since the bytes for the token can be inspected to determine
    if they contain a decimal point, and use utf8json's internals to deserialize accordingly. Also, this
    is needed only in cases where an integer/double distinction is necessary. See FuzzinessFormatter
    for an example.

  8. The equivalent to JsonConverter, IJsonFormatter<T>, only has a generic variant. In several places
    in the client, we may serialize using the an interface, but deserialize using the concrete implementation.
    This is handled by ConcreteInterfaceFormatter<TConcrete, TInterface>, where the formatter
    is IJsonFormatter<TInterface>. An interesting case is when the concrete type should be serialized
    as the interface; in such scenarios, we end up with two formatters, one for the concrete type and one
    for the interface, where each formatter references the others' serialize/deserialize implementation. See
    QueryContainerFormatter and QueryContainerInterfaceFormatter for an example.

Benchmarking the feature/utf8json-serializer branch against the 6.4.0 nuget package in deserializing a fixed byte response of 100, 1000 or 10000 Stackoverflow questions, the following results are collected.

BenchmarkDotNet=v0.11.2.856-nightly, OS=Windows 10.0.17134.285 (1803/April2018Update/Redstone4)
Intel Core i7-4980HQ CPU 2.80GHz (Haswell), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=2.1.500
  [Host]     : .NET Core 2.1.6 (CoreCLR 4.6.27019.06, CoreFX 4.6.27019.05), 64bit RyuJIT
  Job-EXDGCR : .NET Core 2.1.6 (CoreCLR 4.6.27019.06, CoreFX 4.6.27019.05), 64bit RyuJIT
MinInvokeCount=30  MinIterationTime=500.0000 ms  Jit=RyuJit  
Platform=AnyCpu

100 Stackoverflow questions

Method Mean Error StdDev Median Ratio RatioSD Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
Search6x 1,786.4 us 15.78 us 14.76 us 1,785.3 us 1.00 0.00 87.8906 42.9688 - 540.57 KB
Search6xAsync 1,810.0 us 36.57 us 50.06 us 1,792.0 us 1.02 0.03 87.8906 42.9688 - 541.06 KB
Search6xJsonNetSerializer 5,557.4 us 86.19 us 80.62 us 5,547.7 us 3.11 0.04 554.6875 250.0000 15.6250 3450.89 KB
Search6xJsonNetSerializerAsync 4,923.2 us 101.23 us 204.48 us 4,929.8 us 2.72 0.10 - - - 3451.38 KB
SearchBleeding 933.0 us 23.24 us 67.43 us 911.9 us 0.51 0.02 - - - 679.26 KB
SearchBleedingAsync 949.2 us 24.30 us 70.88 us 931.7 us 0.54 0.02 - - - 679.96 KB
SearchBleedingJsonNetSerializer 931.4 us 22.70 us 64.76 us 917.0 us 0.53 0.04 - - - 679.26 KB
SearchBleedingJsonNetSerializerAsync 926.7 us 25.70 us 74.97 us 901.9 us 0.53 0.05 - - - 679.96 KB

1000 Stackoverflow questions

Method Mean Error StdDev Median Ratio RatioSD Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
Search6x 19.473 ms 0.1661 ms 0.1554 ms 19.511 ms 1.00 0.00 625.0000 281.2500 62.5000 3.71 MB
Search6xAsync 15.165 ms 0.2438 ms 0.2162 ms 15.121 ms 0.78 0.01 - - - 3.71 MB
Search6xJsonNetSerializer 50.683 ms 0.9887 ms 1.6790 ms 50.935 ms 2.63 0.09 4000.0000 1000.0000 - 29.6 MB
Search6xJsonNetSerializerAsync 50.297 ms 1.0050 ms 1.6229 ms 50.319 ms 2.56 0.10 4000.0000 1000.0000 - 29.6 MB
SearchBleeding 8.276 ms 0.1817 ms 0.5328 ms 7.994 ms 0.43 0.03 - - - 6.38 MB
SearchBleedingAsync 7.887 ms 0.1972 ms 0.3012 ms 7.790 ms 0.41 0.02 - - - 6.38 MB
SearchBleedingJsonNetSerializer 8.189 ms 0.1745 ms 0.4565 ms 7.954 ms 0.42 0.02 - - - 6.38 MB
SearchBleedingJsonNetSerializerAsync 7.862 ms 0.2369 ms 0.4149 ms 7.687 ms 0.40 0.02 - - - 6.38 MB

10,000 Stackoverflow questions

Method Mean Error StdDev Ratio RatioSD Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
Search6x 203.2 ms 3.901 ms 3.257 ms 1.00 0.00 6000.0000 2000.0000 - 36.39 MB
Search6xAsync 205.6 ms 2.221 ms 2.078 ms 1.01 0.01 6000.0000 2000.0000 - 36.39 MB
Search6xJsonNetSerializer 558.1 ms 8.880 ms 8.306 ms 2.75 0.06 49000.0000 8000.0000 - 298.77 MB
Search6xJsonNetSerializerAsync 564.7 ms 5.126 ms 4.544 ms 2.78 0.05 49000.0000 8000.0000 - 298.77 MB
SearchBleeding 117.4 ms 1.359 ms 1.271 ms 0.58 0.01 4000.0000 1000.0000 - 90.12 MB
SearchBleedingAsync 114.2 ms 1.980 ms 1.852 ms 0.56 0.01 4000.0000 1000.0000 - 90.12 MB
SearchBleedingJsonNetSerializer 118.2 ms 1.572 ms 1.471 ms 0.58 0.01 4000.0000 1000.0000 - 90.12 MB
SearchBleedingJsonNetSerializerAsync 112.8 ms 2.244 ms 1.989 ms 0.55 0.01 4000.0000 1000.0000 - 90.12 MB
  • 6.x is the 6.4.0 nuget package
  • *Bleeding is the utf8json branch

A nice advantage of using utf8json as the internal serializer is that the handoff to a custom serializer can be done using a MemoryStream constructed from an ArraySegment<byte>, avoiding the need to read into a JToken and construct a Stream from the token, much reducing serialization time and allocations.

Allocated memory/op

The allocated memory per op is higher across the board with utf8json. To determine if this was a fixed amount of allocated memory/op, two searches were performed per benchmark method. The amount of allocated memory doubles

10,000 Stackoverflow questions with 2 search requests per benchmarked method

Method Mean Error StdDev Ratio RatioSD Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
Search6x 402.3 ms 3.155 ms 2.951 ms 1.00 0.00 13000.0000 5000.0000 1000.0000 72.77 MB
Search6xAsync 408.8 ms 11.595 ms 16.630 ms 1.03 0.05 13000.0000 5000.0000 1000.0000 72.78 MB
Search6xJsonNetSerializer 1,100.1 ms 7.118 ms 6.658 ms 2.73 0.03 101000.0000 19000.0000 2000.0000 597.54 MB
Search6xJsonNetSerializerAsync 1,037.2 ms 5.950 ms 5.566 ms 2.58 0.02 100000.0000 19000.0000 1000.0000 597.54 MB
SearchBleeding 259.7 ms 3.799 ms 3.368 ms 0.65 0.01 9000.0000 4000.0000 1000.0000 180.25 MB
SearchBleedingAsync 248.5 ms 3.518 ms 3.291 ms 0.62 0.01 9000.0000 4000.0000 1000.0000 180.25 MB
SearchBleedingJsonNetSerializer 260.5 ms 2.991 ms 2.652 ms 0.65 0.01 9000.0000 4000.0000 1000.0000 180.25 MB
SearchBleedingJsonNetSerializerAsync 246.3 ms 3.146 ms 2.943 ms 0.61 0.01 9000.0000 4000.0000 1000.0000 180.25 MB

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions