Skip to content

Streaming parsing of JSON with WebClient [SPR-17328] #21862

Closed
@spring-projects-issues

Description

@spring-projects-issues

Julian Orth opened SPR-17328 and commented

Hi,

AsyncRestTemplate was deprecated in #19962 in favor of WebClient. However, WebClient does not seem to support all of the use cases that AsyncRestTemplate supports (and which RestTemplate does not support.)

Example

Consider the following JSON:

{
    "a": [
        {
            "x": 2,
            "y": 1
        }
    ],
    "b": [
        {
            "x": 3,
            "y": 1
        }
    ]
} 

where both arrays (a and b) have 1,000,000,000 elements each. The goal is to calculate the sum of all x - y over both arrays. (E.g. (2 - 1) + (3 - 1) = 3 in the example above.) 

Solution with AsyncRestTemplate

With AsyncRestTemplate, this is easy: Call AsyncRestTemplate#execute with a ResponseExtractor, plug the InputStream into a Jackson JsonParser, use ObjectMapper to deserialize each array element ad-hoc into

class V {
   int x;
   int y;
},

update the sum, proceed to the next element. Since only one Object of type V needs to be in memory at a time, the memory requirements are constant and low.

Overall, performing this streaming processing of the JSON can probably be done in 25 lines of code using Jackson and AsyncRestTemplate.

The Problem with WebClient

With WebClient, this kind of processing seems to be practically impossible. Jackson appears to only support async parsing at the token level. Anything at a higher level (e.g. ObjectMapper) needs to have all tokens available in a blocking way to parse them.

Therefore, to implement the kind of streaming processing described above, I would have to manually keep track of the JSON tokens parsed and then plug them into an ObjectMapper all at once when I've detected the end of an array element. This is basically what Spring currently does to support streaming of top-level arrays:

WebClient.create().get().exchange().flatMapMany(r -> r.bodyToFlux(V.class)) 

However, even to support only this very limited streaming of top-level array elements, Spring had to re-implement about 200 lines of Jackson logic to keep track of the current depth in the token stream (Jackson2Tokenizer).

Question

Since AsyncRestTemplate is deprecated, there no longer seems to be an encouraged and practical way in Spring 5 to do asynchronous streaming of JSON data. There are several ways to improve this situation:

  1. Un-deprecate AsyncRestTemplate
  2. Upstream complete async support in Jackson
  3. Provide a much expanded version of Jackson2Tokenizer to the public that handles more complicated cases such as the one described above

What are your thoughts on the matter and do you have plans to address this problem in a future release?

Thanks
Julian

PS: A similar problem exists on the server side. With web-mvc, an object returned from a REST endpoint would be streamed into the output stream via Jackson, keeping the memory requirements low. With webflux, a Mono<Object> returned from a REST endpoint will first be serialized into a String before it is written to the output stream.


Affects: 5.0.9

Metadata

Metadata

Assignees

No one assigned

    Labels

    in: webIssues in web modules (web, webmvc, webflux, websocket)type: enhancementA general enhancement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions