Skip to content

Define compound schema documents (#936) #977

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Oct 25, 2020

Conversation

Relequestual
Copy link
Member

Define Compound Schema Document and associated concerns

Resolves #936

@karenetheridge
Copy link
Member

karenetheridge commented Aug 29, 2020

Weird, I interpreted that issue as being about something completely different - that is, what happens when there is a $schema keyword lower down in the document (that is different than the root level $schema). Bundling seems somewhat orthogonal to that concern.

Copy link
Contributor

@handrews handrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Relequestual (and team), great catch on the annotations and $ref issue.

I think simplifying this down to strictly bundling, and then doing a separate PR to talk about dereferencing, would make this a lot easier.

When a reference applicator is dereferenced as part of the bundling process, the Schema Resource
MUST NOT be embedded by replacing the schema object from which it was referenced, or by wrapping
in other applicator keywords. In stead, a bundled Schema Resource MUST be located in `$defs`,
where the key is the `$id` of the Schema Resource, which MUST then be referenced using `$ref`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is easier if you just say that the original reference MUST be left unchanged (but could later be changed through dereferencing, which is defined separately).

Comment on lines 1864 to 1865
to the Compound Schema Document as an instance. It is RECOMMENDED that an alternate
validation process be provided in order to validate Compound Schema Documents
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be an alternate for compound schema documents, or just an alternate for schema documents (schemas against meta-schemas) in general? You should always be able to tell whether a given schema is compound, so if you know your instance is a schema that is enough.

We don't want to encourage two different meta-schema functions, because if someone validating a non-bundled schema with the "regular" meta-schema validation function, but then that schema got changed to a bundled schema, it would now require a code change to be handled correctly. We never want to require a code change to keep validation working.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The consensus in the associated issue was that a compound document would be transport only, in that you can't validate it by applying a meta schema to it as an instance.
Check the issue for the reasoning and justification.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Relequestual I think @handrews was just concerned that this could be misunderstood to mean that you need a separate process to validate a schema than you do to validate a compound schema. The process for validating schema documents is the process for validating compound schema documents.

I think the suggestion would be to do something like ...

It is RECOMMENDED that an alternate validation process be provided in order to validate Compound Schema Documents

The "Compound" part is implied because all schemas are potentially compound.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Relequestual yes, what @jdesrosiers said.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we are on the same page now. I don't disagree.
But, I'm now not sure about a few things. I'm saying the following having not reviewed the issue again (because this is a quick comment in a few mins spare time).

I thought in the issue we said it wouldn't be possible to identify a compound schema document vs a normal schema document, although that may have been in context of being an instance.

I'm going to HAVE to review the issue discussion again. It's fine =]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought in the issue we said it wouldn't be possible to identify a compound schema document vs a normal schema document, although that may have been in context of being an instance.

Yes, that was in the context of being an instance. It's not possible to identify an instance as a schema (compound or otherwise).

@karenetheridge
Copy link
Member

Why do the referenced resources have to have absolute URI identifiers in order to be bundled? In my (admittedly simple) implementation of bundling in a json-schema-using application, which uses $refs of the form "$ref": "common_defs.json#/definitions/Foo", I just move the definition subschema over into the main document and only rewrite the definition name if it conflicts with another name -- so there are no $ids in use anywhere.

@jdesrosiers
Copy link
Member

@karenetheridge What if common_defs.json#/definitions/Foo has a local reference to another definition in common_defs.json? The reference would now point to definitions in the wrong document. Although simple transculsion works in many cases, the only way it always work is if the schema being referenced has an $id.

@karenetheridge
Copy link
Member

karenetheridge commented Aug 29, 2020

@jdesrosiers All the definitions being moved are also scanned for embedded $refs and those schemas moved across as well, and so on recursively until all definitions are moved (with $refs pointing to them rewritten as needed). It's not very hard at all as long as there are no definition names that collide, and in that case they just get a _1 or somesuch appended to them.

Just because there may be edge cases some of the time that require an $id (or would make the bundling process easier) is no reason to require an $id in all cases.

@jdesrosiers
Copy link
Member

@karenetheridge

Sorry, what I wrote doesn't make sense. Let me try again. You're right, you can bundle schemas without $ids, but it means you need to recursively rewrite all the references.

Ideally, whether a schema is bundled or not, you should get the same output. When you rewrite references, your keywordLocation and absoluteKeywordLocation values will be different. A bundle should just be a convenient delivery mechanism, it shouldn't create a new schema. I, for one, want to be able to maintain traceability to where the referenced schema originated.

Also, when you need to embed a schema of a different draft, you need to include $schema, which means you need to have an $id. There's no way around that one.

@Relequestual
Copy link
Member Author

I boradly agree with @jdesrosiers regarding tracability. It should be EASY for people to work out which of the schemas they have bundled, back to the original files. The FHIR bundled schema is comprised of a LOT of schemas. A LOT.

Also, when you need to embed a schema of a different draft, you need to include $schema, which means you need to have an $id. There's no way around that one.

I'm not sure I follow. $schema is required, but $id isn't specifically required in 2019-09 or indeed in current main branch iirc. How did you arrive at this conclusion?

@jdesrosiers
Copy link
Member

@Relequestual I didn't mean anything controversial with that statement. I just mean that without $id you can't embed a schema that is a different version as the referring schema if they are not using the same draft. Quick example ...

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "foo": { "$ref": "./schema/foo#/definitions/aaa" }
  }
}
{
  "$id": "http://example.com/schema/foo",
  "$schema": "http://json-schema.org/draft-07/schema#",
  "definitions": {
    "aaa": { "const": "example" }
  }
}

Using the method @karenetheridge described for bundling, you would get this ...

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "foo": { "$ref": "#/definitions/aaa" }
  },
  "definitions": {
    "aaa": { "const": "example" }
  }
}

This schema will not work properly because const is not a draft-04 keyword in this context. You would have to include $schema to not lose what draft the embedded schema is using.

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "foo": { "$ref": "#/definitions/aaa" }
  },
  "definitions": {
    "aaa": {
      "$schema": "http://json-schem.org/draft-07/schema#",
      "const": "example"
    }
  }
}

But, you can only use $schema at the root of the schema, so you need an $id to make #/definitions/aaa a root schema.

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "foo": { "$ref": "#/definitions/aaa" }
  },
  "definitions": {
    "aaa": {
      "$id": "???",
      "$schema": "http://json-schem.org/draft-07/schema#",
      "const": "example"
    }
  }
}

But, the referenced schema has no $id for us to put there. Therefore, when embedding schemas with differing drafts, the referenced schema must be addressable with an $id.

@Relequestual
Copy link
Member Author

Thanks for clarifying @jdesrosiers, crystal clear now.
I'm going to make some changes based on this, other feedback, and discussion from the associated issue!

@Relequestual Relequestual changed the title First attempt to resolve #936 Define compound schema documents (#936) Sep 27, 2020
@Relequestual
Copy link
Member Author

I've made a few changes based on suggestions.
I still feel like something needs to be said about how $ids of embedded resources can be referenced AS IS.

@Relequestual
Copy link
Member Author

Ah, it's failing because I need to add an anchor first, then xref target that anchor. I'll fix tomorrow.

@Relequestual Relequestual force-pushed the Fix/936 branch 2 times, most recently from 34bb747 to 396ec66 Compare September 29, 2020 22:03
@Relequestual Relequestual marked this pull request as ready for review September 29, 2020 22:08
Copy link
Contributor

@handrews handrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good progress.

I think in general we want to be clear about the use case (bundling) vs the mechanism (compound documents) vs the notion of "dereferencing" which is actually something that happens during evaluation, at least based on how we use it elsewhere.

The question of "how do I detect and handle a compound schema document" is somewhat separate from "why would I make a compound schema document." The "why" part is important to motivate this, but once you have a compound document it doesn't matter why it was created.

</t>
<section title="Bundling">
<t>
A Compound Schema Document is created by taking references (such as "$ref")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically you can just write a compound schema document from the start, so I think a more accurate statement would be something like "A Bundled Schema is a Compound Schema Document that is created..." (idk if I'm capitalizing the right things there but that can be sorted out before publication).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm making a change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is supposed to explain how to bundle multiple schemas to form a compound schema document. Therefore I think this is OK, but I'll re-phrase to make it clearer.

within the referreing document.
</t>
<t>
Each embedded JSON Schema Resource (and the document's root) MUST identify itself
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically the document's root is only a SHOULD for $id, although the embedded resources are indeed a MUST. We probably just want to talk about the embedded ones here as the document root is covered under $id.

MUST NOT be embedded by replacing the schema object from which it was referenced, or by wrapping
in other applicator keywords. Instead, a bundled Schema Resource MUST be located as a value of
a "$defs" object at the containing schemas root. The key of the "$defs" for the now embedded
Schema Resource MAY be the "$id" of the bundled schema or some other form of application defined
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk that this is a MAY thing.... is there really anything to enforce here beyond the obvious need for unique keys under $defs? Is this more of a best practices thing that belongs elsewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Relequestual and others added 8 commits October 6, 2020 22:56
Define Compound Schema Document and associated concerns
Fix typo

Co-authored-by: Karen Etheridge <[email protected]>
Fix typo

Co-authored-by: Karen Etheridge <[email protected]>
…d document. The phrase has a specific meaning
…nt, as it is possible to identify embedded schema resources if you know it's a schema as opposed to just an instance.
…enes need to change, and the of an embedded shema resource is still used as the referene string for by-reference applicator keywords.
$id requirements for document root are already covered, and shouldn't be overriden.
Previously 'dereferenced' was used incorrectly.
Now "bundling" is clearly defined, and MUST and MUST NOT are split for clairty.
@Relequestual
Copy link
Member Author

I've forced pushed but something has done gone wrong or I've done something wrong with the naming of the branches and I'm not seeing my changes =/

@Relequestual Relequestual deleted the Fix/936 branch October 6, 2020 22:11
@Relequestual Relequestual restored the Fix/936 branch October 6, 2020 22:11
@Relequestual Relequestual reopened this Oct 6, 2020
@Relequestual
Copy link
Member Author

Accidentally deleted the wrong branch in github UI...

@handrews
Copy link
Contributor

I think this is looking good- Any of my remaining comments are more personal style sort of things so however y'all want to resolve them (or leave as is) should be fine.

Relequestual and others added 3 commits October 20, 2020 21:36
@Relequestual
Copy link
Member Author

@karenetheridge After your further review, I think this is good to merge.

@Relequestual Relequestual merged commit 365e697 into json-schema-org:master Oct 25, 2020
@Relequestual Relequestual deleted the Fix/936 branch October 25, 2020 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Specify process for validating a compound schema document against meta-schema(s)
4 participants