Skip to content

API Proposal: System.Text.Json: IAsyncEnumerable support #38055

@logiclrd

Description

@logiclrd

Background and Motivation

System.Text.Json provides a generalized mechanism for converting data between object graphs and a textual JSON representation. In general, the mapping between object representation and JSON format is automated, with the JSON structure inferred from the object structure by means of reflection. When System.Text.Json encounters a value of type IEnumerable or IEnumerable<T>, it translates it into a JSON array, and the way in which it does so is unbuffered, meaning that the output stream will receive objects as they are returned by the IEnumerable. In some cases, this is used to stream results in realtime, producing the JSON for objects only as they become available; the IEnumerable's sequence is not predetermined prior to the start of serialization.

This particular function is tied into the way ASP.NET Core implements endpoints. If an endpoint route method returns a value of type IEnumerable or IEnumerable<T>, then ASP.NET Core will (subject to some degree of caching) stream the results of the enumeration process to the client as they become available.

A natural progression of this is to define a route endpoint that returns IAsyncEnumerable<T>, so that any blocking operations used to retrieve the objects as they are returned don't tie up a thread (this being the entire rationale of the async Task model in the first place). However, if IAsyncEnumerable<T> is returned from a route endpoint, ASP.NET Core will try to read to the end of the IAsyncEnumerable<T> and buffer it entirely in memory. This is a work-around put in place because if it passes it directly to the serialization infrastructure, then what is actually sent to the client is the string {}. Because System.Text.Json lacks any processing of IAsyncEnumerable<T>, it treats it as a generic object type, on which it finds no meaningful properties to serialize.

Ultimately, ASP.NET Core should not be doing in-memory buffering of any IAsyncEnumerable<T> result. This completely defeats the entire purpose of both an enumerable type and an async pattern. It also means that if someone writes a web endpoint designed to simply continue sending data as long as the client remains connected, with an IAsyncEnumerable<T> whose enumerator never ends, then ASP.NET Core will attempt to buffer this infinite sequence of objects and eventually run out of memory. With proper support in the underlying serialization infrastructure, ASP.NET Core can remove this buffering and dynamically stream IAsyncEnumerable<T> sequences in the same manner as it already does IEnumerable<T>.

I believe that systematic support for IAsyncEnumerable<T> within System.Text.Json would be useful, not just to support this ASP.NET Core situation, but also to fill out the set of standard system types that can be serialized & deserialized. If System.Text.Json is expected to automatically "do the right thing" with IEnumerable and IEnumerable<T>, it is not unreasonable for a consumer to assume it will also "do the right thing" with IAsyncEnumerable<T>, and this is not presently the case.

Proposed API

I propose updating System.Text.Json so that it automatically serializes values of type IAsyncEnumerable<T> in much the same way that it currently automatically serializes IEnumerable<T>. Similarly, when deserializing, if it is asked to supply a value of type IAsyncEnumerable<T>, then, given as input a JSON array where each element can be deserialized as T, it should be able to supply some value implementing IAsyncEnumerable<T> that contains the deserialized data.

I do not believe there is any merit in requiring consumers to explicitly opt into or register this function; it should be a standard part of how JsonSerializer.Serialize and JsonSerializer.Deserialize work, exactly as IEnumerable<T> already is.

A complication of this is that in a high-volume setting, it is imperative that the engine not be invoking async operations from a synchronous context. As such, implementing this will require that JsonConverter be made fundamentally async. Most of its implementations will not actually operate in an asynchronous fashion, but the call stack must implement it this way end-to-end in order for IAsyncEnumerable calls that block to a continuation to release the thread back to the task scheduler. This is a breaking change that could only be introduced under a new major version number.

An alternative would be to make JsonConverter support both synchronous and async modes of operation, though in this situation, the engine would need to throw an exception if it encounters an IAsyncEnumerable object further down the call stack after having transitioned to a synchronous context. I believe this would be an antipattern that would encourage people to write code that works around the issue in ways susceptible to deadlock under some circumstances.

On deserialization, I believe the only rational approach is to buffer the entire sequence, which is in fact what the IEnumerable/IEnumerable<T> support already does. The reason for this is that the object graph could contain multiple IAsyncEnumerable<T> values, and if they were actually being asynchronously deserialized, each would require its own stream pointer in the underlying data, which is obviously not in a general sense possible. Thus, I believe System.Text.Json should provide a simple object based on List<T> that provides a minimal synchronous IAsyncEnumerable<T> implementation, so that deserialized values of type IAsyncEnumerable<T> provide the core behaviour of IAsyncEnumerable<T> with regard to enumeration.

Usage Examples

Example 1: Convert database query results into JSON

Assume the existence of a database adapter class that can retrieve records using the async model, returning the elements one at a time so that the entire result set is not held in memory:

public class Database
{
  public async Task<IAsyncEnumerable<Order>> GetAllOrders() { ... }
}

A consumer of this might convert all orders to JSON format with a statement such as:

Database database = ...;

var ordersJson = JsonSerializer.Serialize(await database.GetAllOrders());

(This currently returns the string {}.)

Example 2: ASP.NET Core endpoint using async programming model

Pseudocode:

interface IProcessor
{
  Task<IAsyncEnumerable<IOperation>> GetOperationsAsync();
}

interface IOperation
{
  bool IsCompleted { get; }
  Task<IAsyncEnumerable<IResult>> GetResultsAsync();
}

[Route("/v1")]
public class APIController : Controller
{
  IProcessor _processor;

  public async Task<IAsyncEnumerable<Result>> GetResults()
  {
    await foreach (var operation in await _processor.GetOperationsAsync())
    {
      if (operation.IsCompleted)
      {
        await foreach (var result in await operation.GetResultsAsync())
          yield return result;
      }
    }
  }
}

Alternative Designs

I think all alternatives currently available are antipatterns in that they require the consumer to be responsible for the way in which IAsyncEnumerable<T> is processed and serialized, repeating common code across many projects, sacrificing maintainability and performance.

Risks

I cannot think of any ways in which changing IAsyncEnumerable<T> serialization so that it returns a meaningful JSON array instead of an empty JSON object could cause problems. The only possibility I can think of is if someone were to make an object that had properties and also implemented IAsyncEnumerable directly, they might currently have code that expects the properties to be serialized. This seems like an exceedingly odd pattern to me that I suspect nobody has actually depended upon to date.

One possibility is an object that implements both IEnumerable<T> and IAsyncEnumerable<T>. Presently, this object serializes as an array because System.Text.Json picks up the IEnumerable<T> implementation. With this proposal, it would instead pick up IAsyncEnumerable<T>. It is my belief that if an object implements both, it would probably prefer to be enumerated asynchronously. If such an object did exist and provided different sequences, though, then this change would cause the serialized data to change.

Partial Implementation

I created an implementation of this prior to understanding the API proposal review process, and prior to fully comprehending the async-over-sync problem. It lives in PR #37981. The implementation reuses as much existing code as possible by leveraging existing serialization & deserialization functionality designed for the IEnumerable interface. IAsyncEnumerable<T> is not directly related to IEnumerable, but shares obvious genetic material with it.

This implementation would need to be updated to provide a natively-asynchronous model for JsonConverter in order to fully satisfy the needs of this proposal.

My proposed implementation also provides a simple List<T> implementing IAsyncEnumerable<T> as a place to put deserialized data so that code enumerating those IAsyncEnumerable<T> values can read it (even though it isn't actually async -- something which I don't believe to be possible anyway).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions