Skip to content

Caching: Add ETag support to save bandwidth #933

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bart-degreed opened this issue Jan 29, 2021 · 2 comments · Fixed by #998
Closed

Caching: Add ETag support to save bandwidth #933

bart-degreed opened this issue Jan 29, 2021 · 2 comments · Fixed by #998

Comments

@bart-degreed
Copy link
Contributor

bart-degreed commented Jan 29, 2021

One of the said advantages of using REST is it uses the existing rich feature set of the HTTP standard, which is widely supported by various clients, servers and intermediate proxies. One of those features is cache control to reduce network traffic.

See https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ETag for an introduction to ETags.

Note the confusing terminology here: a 'resource' in HTTP refers to the full response body, while a 'resource' in json:api refers to a section within the response body. For example, a HTTP resource could contain a collection of json:api resources, followed by a collection of related json:api resources. Or even partial json:api resources, when sparse fieldsets are used.

The use-case here concerns reducing network traffic. Not optimistic concurrency (due to the above mismatch). And not reducing database pressure (see below).

For example, consider a mobile client that wants to refresh its locally cached version of a json:api resource. The server sends back "Not Modified" if it determines the resource is still fresh. If stale, it sends back the full json:api resource. Likewise, the mobile client can check periodically whether a subset of the json:api resource attributes has changed, by repeatedly requesting it with a sparse fieldset. The same applies for a collection of resources: if the collection is unchanged since last time, the server does not need to send it again.

This proposal optimizes network traffic, not database pressure. For each request, the server still needs to fetch the data, in order to determine if it has changed. When caching database results is desired, developers can already implement their own IResourceRepository and apply various caching strategies there. It is not the scope of this proposal.

To implement this, JADNC sends back a strong ETag header on GET responses, which contains an MD5 hash of the binary response body. For an incoming GET request with an ETag, the server renders a response internally and calculates its MD5 hash. When both hashes are identical, the server returns Not Modified with an empty body.

If calculating the hash adds substantial overhead, we should put it behind an option that is disabled by default.

Example:

GET /api/blogs/1/articles?include=author&fields[articles]=title HTTP/1.1

HTTP/1.1 200 OK
ETag: "51142bc1-7449-479b075b2891b"
Content-Type: application/vnd.api+json

{
  "data": [{ ... }],
  "included": [{ ... }]
}

Then, after some time....

GET /api/blogs/1/articles?include=author&fields[articles]=title HTTP/1.1
ETag: "51142bc1-7449-479b075b2891b"

HTTP/1.1 304 Not Modified
@bart-degreed
Copy link
Contributor Author

Did some measurements. Running 1500 requests on my machine takes 0:56:763 without and 0:57:251 with MD5 hashing. This results in 0.8% overhead, so not worth making configurable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

1 participant