-
-
Notifications
You must be signed in to change notification settings - Fork 316
Question: is schema information embeddable with the instances (actual data)? #660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@anatoli26 Hyper-Schema is primarily intended to be used as a separate document from the instance data. Linking the two is discussed in section 10 of the Core spec (see also issue #601 for additional ideas). As the hyper-schema is linked to the instance at runtime, there are several ways that Hyper-Schema can be a dynamic hypermedia format:
The first of these is the most flexible, although since one of the advantages of serving the hyper-schema as a separate resource is that it need not be fetched repeatedly at runtime, you can go overboard with it. But some good use cases are serving different a different hyper-schema based on the authorization level of the logged-in user, or on whether some back-end condition makes a particular state change impossible. For Finally, you can use Hyper-Schema along with a format like HAL to augment HAL's capabilities. In such a case, you would just have a URI template consisting of a single variable- the field where HAL provides the complete URI. Does any of this help? I know I'm kind of rambling on through several points here. |
Henry, thanks for you detailed and fast reply. My 1st question is answered, though I got a new related question: if there's no way to specify a header (e.g. the media doesn't have the "header" concept), how should I pass this information, if the only 2 methods (the Link header and the Content-Type header suffix) are based on headers? With respect to the 2nd question (the hypermedia use), do you mean that if returning a collection of resources (like the long example at the bottom of my initial question), there's no way to indicate a specific set of actions/relations for each item of the collection, without resorting to ad-hoc properties like {writable: true/false}? One of my use cases is exactly what you've mentioned: serving different results based on the authorization level of the logged-in user, but the issue is that there could be several items in the same collection that have a particular set of actions/relations, not all of them. Example: there is a list of comments for an article ( The server knows which comments belong to the user so it could embed corresponding links for actions/relations to each comment like in my example. But if there's no way of embedding Hyper-Schema information on per-item basis, the only way to handle this is by adding ad-hoc properties to each entity of the system? In this case, won't we have a polluted schema definition? Won't it affect the allowed fields to be sent to the server on POST/PUT requests? Sure we could somehow ignore these ad-hoc properties (though there could be unnecessary issues at validation), but that doesn't sound like a clean approach, does it? According to the definition of the HATEOAS constrain:
What are the best-practices to use Hyper-Schema to comply with it? I can't understand how the server could indicate dynamically the aspects of each resource without resorting to ad-hoc properties. If there is actually no clean way to use Hyper-Schema for HETEOAS constrain alone, what are the best practical implementations of combining Hyper-Schema with other specifications? My objective is to be able to define a fully RESTful client automatically based on the schema/api definition (with a reasonable level of manual refinements for specific customizations). JSON Schema + Hyper-Schema looks like almost a complete standalone solution. |
Additional thoughts. AFAICU, JSON Scheme is static in nature. That makes sense as the entities' definitions don't change often and we can manage the changes with schema versions. But AFAIK, the hypermedia is dynamic in nature, it needs to be able to communicate a varying state on each server reply. At the same time, as Hyper-Schema is a vocabulary extension of JSON Schema, it probably inherits its static nature. What I would like to understand is if the inherited static nature of Hyper-Schema is its fundamental limitation or there are approaches that I'm not aware of to make it dynamic. If there is no way to make it dynamic, IMO it's not well suited for the hypermedia as the engine of application state specification. In this case, probably some other HATEOAS specification that supports actions, like Mason or Hydra, could be used with JSON Schema to achieve the goal of (almost) fully automated client definition based on server-provided information. If this is actually the best approach, do you know any such combination, defined in a ready-to-use specification, or successfully implemented as a custom solution in some project? The only specification I'm aware of is the Web Thing Model, but it is targeted specifically at IoT communications, not well suited for generic RESTful APIs. |
If you use a media type anywhere (in a configuration file, for example), you can use the media type parameter form. Otherwise, there is no specific recommendation for linking schemas and instances. This is for two reasons:
Per RFC 7231, whatever representation you receive from doing an HTTP GET on a resource is the representation of that single resource. If that resource happens to overlap with other resources, as is the case with collection resources, that is part of your resource design and not part of HTTP's semantics. You can, with hyper-schema, have a separate link defined for each of the items in the collection, using the array form of the For a variable-length/variable-content array, you would presumably need to generate the hyper-schema each time the collection is requested. If the array is relatively static, you could make use of caching, but if it is very dynamic you would re-generate the hyper-schema each time.
This is a false dichotomy. Hyper-Schema is not embedded, hyper-schema is linked. But hyper-schema can include per-item Link Description Objects, and the set of of such things can be generated and linked dynamically for each request. So neither of the options you list here are correct. It is dynamic, but it is not embedded.
I don't know what you mean here. You seem to be assuming that the wrong schema will be linked? You are in control of what schema is linked at what time, and how accurate it is (does it reflect the runtime state at the time of request, or does it reflect all possible options, some of which may not be available at runtime? both designs are valid choices with different tradeoffs). So whether the hyper-schema is "polluted" or not (and I don't know what you mean by that) is up to you.
As shown in the draft-handrews-json-schema-hyperschema-01 examples, one recommended approach is to have an entry point resource which is either 204 No Content, or just an empty JSON object or some other "blank" resource. The point of this resource is to link your entry point JSON Hyper-Schema, which provides links to all other resources that are directly accessible from the entry point.
Just dynamically choose which hyper-schema to link.
It is intended to be, although I would not call it quite complete yet, although it is complete enough to build a usable non-trivial API, depending on exactly what you want to convey. Hyper-Schema currently does the following things:
This means that the precise usage of the templated links relies on the client understanding how these things fit together with the given protocol, most often HTTP. e.g. the body of a PUT should conform to It does not currently define exactly how to construct specific requests for specific purposes (e.g. "to turn the system on, PATCH the I'm going to ignore your last comment for now as it seems predicated on the assumption that hyper-schema is static, which is incorrect. Please let me know if that point is still unclear. |
Henry, thanks again for so detailed reply.
This part was what I was missing. Now everything is much more clear.
Does that mean that for each request/reply for a collection, the client would have to make an additional request for the dynamic schema generated for this reply? How would one implement that in practice? Should the server store the dynamically-generated schema with some persistence, or should it generate it again as if processing a new request, recalculating everything, but this time returning the schema information instead of the actual data? Won't it imply double processing for the server with possible synchronization issues (the server state at the moment the client makes the schema request has changed and now it doesn't match the data request made previously (the time range between the data and schema requests would be in dozens-hundreds of milliseconds range, but the server state could have changed anyway))?
I was meaning that if we use
The property
So when the client prepares a POST request (creates a form for the user), it would have to know/decide what to do with the Is there a better approach for the conditional processing? My only limitation (and hence all these questions) on using Hyper-Schema for HATEOAS is to cleanly/efficiently (without double requests, double processing, etc.) indicate to the client all possible actions on a completely variable-length/variable-content (content in the sense of possible actions) collection of resources.
I was meaning that the schema information is not embeddable to the JSON data document and hence can't be attached to each collection element independently (please, excuse my poor terminology). So, summarizing (as far as I could understand, please correct me if I'm wrong), for a variable-length collection of items of the same type, but each with a different set of possible actions, the only options available from Hyper-Schema are:
|
@anatoli26 my apologies for the long gap since your last post, things got very busy for me. Also, my apologies if I miss a point in here- I only have a bit of time to skim this and reply, and have not read back through the whole thing.
Both are possible and have their own trade-offs. Honestly, no one that I know of has gotten this dynamic with Hyper-Schema yet. I have a project for which I hope to explore these issues to get some real-world information on feasibility and best practices, but I'm not quite there yet. I'd love to hear from anyone who tries to make any of this work.
Hypermedia in general has synchronization issues. By the time you receive a response, it is by definition potentially out-of-date. All multi-resource documents (such as this GitHub issue page, consisting of various HTML, CSS, JavaScript, image, etc. resources) have the problem that later-fetched resources can be out of sync with earlier-fetched ones. The usual mechanisms for mitigating this should be employed:
I'll come back and respond to your other points when I get a bit more time. |
Hi Henry, thanks for your comments. My previous questions were based on my initial, lightweight introduction to JSON Schema. Now, as I've just finished reading the entire specification (all three parts), draft-08 as it is now in the repo + 4 open PRs, everything became clear. A side note, while reading the spec, I got numerous stack overflows in my mind and I had to debug the core dumps studying with a lot of attention the understanding-json-schema website. IMO, the specification itself would be much easier to understand if it had more real-world use-case examples + reformulation of some paragraphs, especially the hyper-schema part. I still have doubts about some of hyper-schema aspects (especially the example in 9.4. "anchor", "base" and URI Template Resolution), mostly because the understanding-json-schema site has nothing about it and there are just a few examples in the specification itself, but I believe I can say now that I do understand the specification as such. Returning to my main question about hyper-schema and HATEOAS for a completely dynamic, variable-length, variable-state collection, actually there's no need for the last 2 alternatives from my previous post (2 request to the server, one for the collection's instance and another for the schema describing the collection – or – for external specs, like HAL, mixed with JSON Schema). Everything could be solved with high efficiency (even less traffic overhead than with JSON Schema + HAL) by utilizing annotations and So, the idea is to define the entity schema like this:
and
Initially I didn't understand that But then comes the issue of how to define these additional properties so they don't "pollute" the "resolved" entity schema, so that when an UI generator would have to prepare forms for user input it knows how to treat these fields without additional information. If we define the HATEOAS properties like any other property (like in the example above), then we'd have to craft special This would become unmaintainable with anything but the most simple systems. And today we can avoid So then I discovered the
as, according to the specification:
So the UI generation tools may have to ignore these properties. But, after checking most of the UI generation libraries, I've discovered (besides that most were NOT interpreting this annotation at all) that the two that were taking it into account (json-editor and angular-schema-form), were showing these This is how I came to a conclusion that a new annotation, similar to My proposal is to introduce a new annotation Properties with this annotation SHOULD be excluded from user interface instance generation and the actual requests sent to the server. When the managing authority (i.e. server) doesn't expect to receive any indication of a resource state, it SHOULD ignore the properties with this annotation. [CREF1: validation of a request instance on the managing authority side could generate an assertion instead of just ignoring these properties.] Other possible names I thought of: And the
So now the server could generate the collection this way:
and the UI form generation tools would completely ignore these hypermedia properties as if they weren't defined. This way the network and CPU efficiency of this solution would be much higher than of typical HATEOAS specifications with actions like Mason, Siren, Collection+JSON or HAL/CPHL (collection examples in the links), that normally provide an array of links for each item of a collection. If you think this annotation could be added to the specification, I can submit a PR with the description following the same writing style as the Also, I'd like to prepare examples for hyper-schema at understanding-json-schema website with your help.
I'm actually trying to develop a CRUD system for a client, defined entirely with the JSON Schema spec, with the main objective of generating client-side functionality (UI and behavior) entirely from the schema definitions and validating user input on client and server with some JSON Schema validation library. Probably I'd have to finish this project with just the forms auto-generated, as I couldn't find any UI lib for hyper-schema, but the intention is there. |
This is not how specifications work. They are not tutorials. Someone else writes tutorials outside of the spec (ideally in the Understanding JSON Schema project, as you noted). WE WOULD ALL LOVE IT IF SOMEONE WOULD DO THAT. :-) However, I barely have time to write the spec. |
{
"type": "object",
"additionalProperties": {"type": "integer"}
} describes a map of strings to integers (JSON property names are always strings, of course). You can more or less do "inclusion" with You are correct that the boolean schemas |
@anatoli26 this is getting into territory that I am not sure fits into hyper-schema, although it's probably going to be a while before I have time to dig through this and sort out the details. This is probably something you are better off discussing on slack channels (ours or apisyouwonthate) as there is a wide-ranging debate over what full hypermedia API controls should look like. I'll probably come back to this after we get draft-08 done and see where we are then. |
Since the answer to "is schema information embeddable with the instances" is "no", can we close this out or restate the question? |
Closing given the above comment. |
Sorry if this is something trivial and explained the docs, I'm trying to understand JSON (Hyper-)Schema, but it looks like I'm missing the most important aspect of how to utilize JSON Schema & Hyper-Schema in practice for RESTful APIs.
My doubt is whether the schema information is embeddable into the instance data replies (returned by the server) or they coexist in parallel? All the examples I've seen so far show the instance data and the scheme describing the corresponding entities in separate structures. Is it the only way possible?
Say, the client sends a request:
GET /users
, my typical JSON reply would be:So, my doubts are:
user
entity with a particular JSON Schema definition (the schema definition being either embedded in the results or provided as a URI)?With respect to the 1st question, I understand how to construct JSON Schemas. What's not clear to me whether the JSON Schemas are a static data that exists independently of the instance data and the client should first consult the schema and only after that it will know where to get pure instance data and how to interpret it OR there exists a way to embed the schema information (via links or other ways) to indicate which entities the instance data objects correspond to?
With respect to the 2nd question, my doubt is similar, i.e. are the hyper-schema definitions also static data and the possible actions (via
"targetHints": { "allow": [...] }
) should be managed viaif then else
conditions on certain properties (so I'd have to add special ad-hoc properties indicating if the instance is read-only or read-write) OR is there a way to embed this information (again, probably via links) into the actual data, like with Mason or Hydra?In other words, is it possible to return with JSON Hyper-Schema something equivalent to the following?
The text was updated successfully, but these errors were encountered: