Skip to content

Clarify that we are interested in the data model. #1194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 19 commits into from

Conversation

ioggstream
Copy link
Contributor

@ioggstream ioggstream commented Feb 23, 2022

This PR

  • We have "JSON text" in notational conventions,
    so there's no need to reference the application/json media type
  • Since the document references an equivalence relation between JSON text, JSON value and JSON document
    now JSON Document references the Data Model section below, so we don't need to mess with "sequence of bytes"

jdesrosiers and others added 17 commits June 4, 2021 10:58
* Change URIs to IRIs where appropriate
* Remove bookending requirement for dynamicRef

* Add $dynamicRef changes to change log
Looks like this one has been missed.

Signed-off-by: Juan Cruz Viotti <[email protected]>
It is a best practice to allow callers to override the value of these
program variables. For example, with this chance, you can do:

```sh
make XML2RFC=/opt/bin/xml2rfc
```

See: https://www.gnu.org/software/make/manual/make.html#Command-Variables
Signed-off-by: Juan Cruz Viotti <[email protected]>
* Since we have "JSON text" in notational conventions, 
   we don't need to reference the application/json media type
@ioggstream ioggstream changed the title application/json is the media type of a JSON text Clarify that we are interested in the data model. Feb 23, 2022
interchangeable because of the data model it defines.
</t>
<t>
JSON Schema is only defined over JSON documents. However, any document or memory
structure that can be parsed into or processed according to the JSON Schema data
model can be interpreted against a JSON Schema, including media types like
model can be interpreted against a JSON Schema, including data formats like
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CBOR is a data format. It can have different media types.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job

Comment on lines -184 to -185
A JSON document is an information resource (series of octets) described by the
application/json media type.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iiuc this means "JSON documents" := "JSON text", thus resulting in a redundant definition.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right. I don't think there's much value in including that sentence.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn

@ioggstream ioggstream marked this pull request as draft February 23, 2022 12:02
@awwright
Copy link
Member

awwright commented Mar 7, 2022

The JSON Schema spec is being submitted as an Internet specification, or at least must be compatible with Internet standards. Saying "application/json media type" is intended in the first part, as it's a well-defined technical term, and using it implies something specific for interoperability on the Internet.

It's saying more than simply "JSON document := JSON text." (Where "JSON text" is the ABNF production.) A JSON document is not merely any series of bytes that matches the JSON text grammar. It also has to be marked as being a JSON document. A Web browser cannot parse a "text/plain" document as JSON just because it matches the JSON grammar.

As for using "media types" vs. "data formats", I don't care one way or the other, we don't actually need the well-defined term here. But CBOR is a media type, it's content-type registration is application/cbor.

@ioggstream
Copy link
Contributor Author

Hi @awwright, let me clarify + forgive the nitpicking

The JSON Schema spec is being submitted as an Internet specification, or at least must be compatible with Internet standards.

Ok

Saying "application/json media type" is intended in the first part, as it's a well-defined technical term,
and using it implies something specific for interoperability on the Internet.

My understanding of "The media type for JSON text is application/json" from RFC8259 is that they imply the same thing.

A JSON document is not merely any series of bytes that matches the JSON text grammar.
It also has to be marked as being a JSON document.
A Web browser cannot parse a "text/plain" document as JSON just because it matches the JSON grammar.

Your point is clear now. Thinking twice, imho a text/plain document is not a "JSON text" since according to RFC8259, the media type for JSON text is application/json. But I may be wrong.

As for using "media types" vs. "data formats", I don't care one way or the other,
we don't actually need the well-defined term here.

To progress with the draft, in my experience these are the kind of clarifications I was asked to resolve by the IETF community for a couple of specs I started working, so I'm just trying to share the lesson learnt :) While I try to do my best, I can still make mistakes. Moreover I understand that this whole discussion may seem pointless, but I think it is not from the perspective of a first time reader.

But CBOR is a media type, it's content-type registration is application/cbor.

See above wrt "pointless nitpicking": my understanding from reading CBOR is that it is a data format.
The registered media type for a single encoded CBOR data item is application/cbor.

@awwright
Copy link
Member

awwright commented Mar 8, 2022

As far as I'm concerned, nit-picking is worthwhile and encouraged.

My understanding of "The media type for JSON text is application/json" from RFC8259 is that they imply the same thing.

I think this is an unfortunate case where RFC 8259 is using "JSON text" to refer to both the ABNF production (the set of all strings that form valid JSON) and a JSON document (a string that's valid JSON that's known to be application/json). Whereas I see these as different things and deserving of different terms. You're probably right, that I may be making too much of a distinction where it's not needed.

I'm not sure why RFC 8259 avoids the term "JSON document" since that's the terminology in other media types ("HTML document", "PDF document", etc). I didn't previously notice this.

See above wrt "pointless nitpicking": my understanding from reading CBOR is that it is a data format.

I think it's fair to call CBOR either a data format or a media type; since media types I would describe as standardized data formats.

Copy link

@junioryo junioryo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@ioggstream ioggstream marked this pull request as ready for review March 8, 2022 15:15
@jdesrosiers
Copy link
Member

The JSON Schema spec is being submitted as an Internet specification

Just to be clear, the only thing we plan to submit as an internet standard are the JSON Schema media types, which are just relatively minor extension of application/json. There are no plans to submit any particular dialect defined in the past or the future as an internet standard.

@awwright
Copy link
Member

@jdesrosiers I would call JSON Schema a little more than a "relatively minor extension of application/json"... What do you mean by that? The namespace is in the standards tree, which means it must be "of general interest to the Internet community", have some sort of permanent document that completely defines its behavior, and it receives some sort of expert review.

@jdesrosiers
Copy link
Member

This suggests that authors can write JSON Schema documents that have no meaning.

I've said multiple times now that the meaning comes from the dialect declaration. Does this not make sense? Do you see this as insufficient? Why?

Are you proposing an extension mechanism?

Sure, you can think of dialect declaration as an extension mechanism. The document has no intrinsic semantics. You extend those semantics by declaring a dialect. The media type defines how to declare the dialect of the schema using the $schema keyword or the schema media type parameter. That declaration is like doctype declaration in an HTML document that tells the user-agent what version of the HTML specification to use to interpret the rest of the document. The dialect declaration tells the JSON Schema implementation what version of the specification to use to interpret the schema.

But, yes, if the document doesn't declare a dialect, the document would have no meaning that a user-agent could act on. All meaning and behavior is pushed to user-space. Other than JSON Pointer fragment identification, it would be just like working with a JSON document. It wouldn't function as a schema, but that doesn't mean it's useless in some other context. You could also use this media type to declare a dialect that has nothing to do with schemas. It could have some other purpose entirely. Of course the media type isn't specifically designed for or intended to be used in these ways, but it's possible.

@awwright
Copy link
Member

the meaning comes from the dialect declaration

I asked earlier: If I specify the schema {"type":[]} or {"not":{}}, how do I know what that means? More specifically, how do I determine what the "dialect" is?

@awwright
Copy link
Member

the document would have no meaning that a user-agent could act on

it would be just like working with a JSON document. It wouldn't function as a schema

It sounds like what you're proposing is just JSON with Namespaces. There's similar projects that do this, like JSON-LD. I'm not necessarily against namespaces or a new media type for them, but this misses the point of JSON Schema, which is to have a single, standard vocabulary for describing assertions, annotations, and sets of JSON.

And also, most people are not interested in declaring a meta-schema or dialect for day-to-day use of JSON Schema. Different implementations doing something different for even simple statements like {"minimum":0} would be fatal.

And by the way, I see a good idea related to what you're proposing; what would it take to encode a meta-schema as a JSON-LD document, and write a validator that supports these JSON-LD schemas?

@jdesrosiers
Copy link
Member

More specifically, how do I determine what the "dialect" is?

Sorry, I didn't realize that was the question you were asking. Dialect identification works the way it always has. See here for a summary. If no dialect is declared, behavior is undefined. Implementations can assume a default dialect, or use heuristics to guess a dialect or, throw an error, or proceed with no dialect (all properties treated as unknown keywords). I'd rather it not be undefined, but it has to be compatible with older dialects and existing implementations.

It sounds like what you're proposing is just JSON with Namespaces.

If you think of a dialect as a namespace, I guess you could say that, but that's not really the point. The point is just to define something flexible enough to support any past or future JSON Schema dialect.

the point of JSON Schema [] is to have a single, standard vocabulary for describing assertions, annotations, and sets of JSON.

I disagree that that's the goal, but even if is, the reality is that we have many dialects of JSON Schema in use today and this media type needs to support all of the dialects people are using in the wild. Then you also have to take into account the third-party dialects out there for things like code-gen, form-gen, and databases. All of these should be able to use the media-type as well. That's why the media-type needs to be flexible.

@awwright
Copy link
Member

awwright commented Mar 21, 2022

Dialect identification works the way it always has

I'm not sure what you're referring to, "dialect" is new term to me. Usually we just called this the meta-schema.

the reality is that we have many dialects of JSON Schema in use today

Can you specifically name some of these dialects?

JSON Schema is not (or should not be) any meaningfully different from HTML... over time new features (versions) have emerged, but there is only one HTML "dialect," and newer specifications are backward-compatible. (In HTML 4.01 you could declare usage of one of a few DTDs, but these have all been superseded.) And if these features are not sufficient, you can use a different XML namespace.

@jdesrosiers
Copy link
Member

I'm not sure what you're referring to, "dialect" is new term to me. Usually we just called this the meta-schema.

The term was introduced in the 2019-09 spec.

The "$schema" keyword is both used as a JSON Schema dialect identifier and as the identifier of a [meta-schema]

Previously, this said "version" instead of "dialect", but the meaning is the same. The concept hasn't changed, only the word we use to describe it.

Can you specifically name some of these dialects?

Some examples include draft-04, draft-07, 2020-12, Open API 2 JSON Schema, Open API 3.0 JSON Schema, Open API 3.1 JSON Schema, and Mongo DB JSON Schema.

JSON Schema is not (or should not be) any meaningfully different from HTML... over time new features (versions) have emerged, but there is only one HTML "dialect," and newer specifications are backward-compatible.

I completely agree that we want to go that direction with JSON Schema, but we still have to support the current and older versions that don't work that way. I expect to end up in the same place as HTML is today. HTML no longer has versions, but it still has doctype which is still needed to differentiate it from the previous versioned releases. Similarly, I expect $schema/schema to always be needed, but it should get simpler in the future because the value never changes.

@awwright
Copy link
Member

awwright commented Mar 22, 2022

My understanding is that the schema identifier is provided as a convenience, and it doesn't allow someone to unilaterally redefine a keyword to be (significantly) backwards-incompatible. But it can be used as a hint to suggest that there's custom keywords in use, or that there's new keywords that the validator doesn't understand (which is effectively the same thing from the validator's point of view).

There's a few reasons I understand it this way:

First, the meta-schema is not authoritative, just informative. The spec defines the behavior and permitted values, the meta-schema tries to describe that as closely as possible; which is why we can issue revisions of a meta-schema. Or, at least it used to work this way.

Second, the 8.1. Meta-Schemas and Vocabularies section begins by listing just two purposes, which makes the following section (8.1.1. The "$schema" Keyword) seem much less ambitious than I gather from your description.

Third, version identifiers are frequently ignored. Maybe, if we absolutely had to, we could use $schema to backtrack on functionality that we decided was a bad idea. (Similar to User-Agent sniffing, how TLS uses its version identifiers, or "quirks mode" in HTML.) But using a version identifier for something more than this is generally a bad idea.

Finally, as a general Internet principle, undefined and implementation-specific behavior should be avoided, and the idea that an application/schema+json document would have an undefined meaning until we define the dialect in use is somewhat unprecedented. There's some similar examples but I don't think they match up:

(a) xmlns changes the semantics of an XML document. In XHTML, you had to specify the xmlns=, but this isn't quite the same thing because it's the same regardless of the HTML version, and if you omit the namespace declaration, the application/xhtml+xml document would fail to validate entirely.

(b) Doctypes they don't change the semantics of the document, only its grammar. I think this is essentially the same as a JSON schema. HTML 4.01 had three DTDs, a version that was reverse compatible with older HTML, a "strict" version for new documents that provided stronger guarantees of cross-platform compatibility, and a frames version because mixing frame elements with non-frame elements is nonsensical.

There's a lot of implementations out there that will never implement $vocabulary or even $schema because there's no need to. The core functionality can all be implemented, with reverse-compatibility supported all the way back to the beginning (with very niche exceptions probably not encountered in the wild).

Some examples include draft-04, draft-07, 2020-12, Open API 2 JSON Schema, Open API 3.0 JSON Schema, Open API 3.1 JSON Schema, and Mongo DB JSON Schema.

Thanks, this is helpful. I don't see a need to identify these as separate dialects. draft-04 through 2020-12 are just different releases of the meta-schema: They're a machine-readable version of the same media type specification at different points in time.

Open API 2 JSON Schema etc. don't need their own dialects either, unless they're defining custom keywords. You can implement a subset of JSON Schema functionality as long as you bail out if a schema tries to use the unsupported functionality.

@jdesrosiers
Copy link
Member

I really like this vision. It's very similar to where I want JSON Schema to end up. Where we disagree is that your interpretation can be applied to the current and historical state of JSON Schema. Every release (with the possible exception of draft-07) has had backwards incompatible changes. This makes it necessary to know which dialect defines the intended semantics for a given schema. id isn't just a deprecated version of $id. id doesn't have any effect at past draft-04. $ref works differently starting in 2019-09. The only way to know how to properly interpret it is to know what draft the schema was written for. In HTML you can evaluate newer HTML against an implementation written for older HTML with just some features missing. You can also evaluate old HTML against an implementation written for the latest HTML. The same isn't true for JSON Schema. That's why we need dialects.

In the relatively near future, I'd like to declare something like https://json-schema.org/schema as the last dialect we'll ever create, commit to backwards and forwards compatibility, and continue as a living spec like HTML. But, we're not there yet and even if we were, we would still need to support the older drafts somehow, not to mention dialects for things other than validation.

Finally, as a general Internet principle, undefined and implementation-specific behavior should be avoided, and the idea that an application/schema+json document would have an undefined meaning until we define the dialect in use is somewhat unprecedented.

You're thinking about this differently than what is intended. The data in an application/json document has no semantics. It's just data. It's meaning is out-of-band knowledge baked into whatever application is using it. An application/schema+json document without a dialect would be exactly the same. In application/xml it would be the same as an XML document with no namespace, just raw values. Adding a namespace in that XML to give those values meaning is similar to declaring dialect. That's precedent we're building on. But, again, no one is expected to use application/schema+json this way. Technically you could use it without a dialect, but that's not what it's designed for.

@awwright
Copy link
Member

Not to get this PR too off-topic, I filed json-schema-org/community#189. Sorry I didn't realize I had a somewhat different interpretation than everyone.

Every release (with the possible exception of draft-07) has had backwards incompatible changes. This makes it necessary to know which dialect defines the intended semantics for a given schema. id isn't just a deprecated version of $id. id doesn't have any effect at past draft-04.

I don't think this has ever been necessary. With respect to changing spec'd behavior, my thought at the time was we would not introduce behavior that would be incompatible with older drafts; implementations could continue to implement removed behavior, from the standpoint of not breaking reverse-compatibility in their own libraries.

Additionally, declaring (and being valid against) the newer meta-schema is a way of verifying that you don't use any deprecated functionality.

And the reason that we have different drafts described in the test suite is not to suggest the behavior can change outright, but so that implementations can test backwards compatibility. My own implementation passes every test in every draft (except a few of the newer features in 2019-09), yet it doesn't read "$schema" at all. Do you have any counter-examples? Are there any tests that flatly contradict themselves between drafts?

It occurs to me now, the best way to handle backwards compatibility is not to remove the functionality from the spec, but to move it into a separate section describing deprecated functionality. The downside is this is slightly more complicated to author.

Infrequently, there were also a few features that we retconned (for example: "$ref" only being permitted where a subschema is expected), that could potentially break someone in the wild, but we're not aware of anyone using it in a way that would break (or it would be an easy fix that could be withheld until a major version increment in an implementation), and the benefits outweighed the costs.

The data in an application/json document has no semantics. It's just data. It's meaning is out-of-band knowledge baked into whatever application is using it. An application/schema+json document without a dialect would be exactly the same.

Normally I would agree application/json has no semantics, but with respect to this comparison, I would point out it sort of does: It represents a data model that can represent arrays, objects, etc. It would be strange if it did anything more than that! We sort of expect that it will gain additional meaning depending on the context it is found within.

In contrast, with JSON Schema, we do define semantics. And I wouldn't ever expect that "type" can possibly mean different things in different contexts; it's a standardized name for a standard rule.

@Relequestual
Copy link
Member

Are there any tests that flatly contradict themselves between drafts?

Yes.

Take $ref for example. In draft-07 and prior, sibling keywords must be ignored.
https://github.com/json-schema-org/JSON-Schema-Test-Suite/blob/060caae0dd58e34af0449baec1103606a0ef4977/tests/draft7/ref.json#L146-L177

While in 2019-09 and above, they must NOT be ignored.
https://github.com/json-schema-org/JSON-Schema-Test-Suite/blob/060caae0dd58e34af0449baec1103606a0ef4977/tests/draft2019-09/ref.json#L146-L177

Unless a dialect is provided, how do you know the intent of the schema author?

Take format as another example. In draft-07 and prior, support was optional, but you could perform assertions if you wanted.
From 2019-09 and above, you strictly cannot provide assertions from the format keyword.

My own implementation passes every test in every draft (except a few of the newer features in 2019-09), yet it doesn't read "$schema" at all.

Unless the behaviour of JSON Schema validation is predictable and reliable, it's useless.
That's the whole reason we split format into two vocabularies... predictability and reliability is paramount.

I'm all for discussing future drafts being backwards compatible without breaking changes (when we've reached acceptable stability), but that's future, and this is now.


Open API 2 JSON Schema etc. don't need their own dialects either, unless they're defining custom keywords. You can implement a subset of JSON Schema functionality as long as you bail out if a schema tries to use the unsupported functionality.

That IS the exact reason they DO need their own dialect; OpenAPI defines their own custom keywords.

It's totally possible to create a dialect using the core and applicator vocabularies, but not the validation vocabulary.


@jdesrosiers @awwright I'm very strongly of the opinion that a "living standard" would be a huge mistake for JSON Schema.
It has already resulted in several serious security issues in terms of URLs.
(see https://json-schema.slack.com/archives/C8CQ81GKF/p1648542728065289?thread_ts=1648454348.004629&cid=C8CQ81GKF)

Show notes/transcript (from page 7): https://www.grc.com/sn/SN-853-Notes.pdf
The security paper/report: https://claroty.com/wp-content/uploads/2022/01/Exploiting-URL-Parsing-Confusion.pdf

I think "living standard" sort of works for WHATWG because the major vendors are at the table and commit to implement or reject and riase concerns over changes. The objective they have is to align the spec with the implementaitons, of which there are few, and over which they have control. Neither of those things are true for JSON Schema.

WHATWG reference: https://whatwg.org/faq#living-standard

I'm open to being convinced otherwise.


Any implementation which attempts to mish-mash across dialects of JSON Schema is in for a hard time, and can't provide reliable and predictable validaiton. IMHO, it's a travesty. I can't imagine a situation where unreliable and unpredictable validation is even close to acceptable. This is why we have the test suite, and why JSON Schema isn't dead.

@awwright
Copy link
Member

Yes. Take $ref for example. In draft-07 and prior, sibling keywords must be ignored.

Ok, good point.

This is another example that I don't think is a problem in practice. In my implementation, I forgot, I do mask a few tests. It's one of those situations I described where it didn't really break anyone in the wild, and if it does, it can be implemented after a major version increment, and there's an easy fix. (In this case, if you had a draft-07 document with ignored keywords adjacent to a $ref keyword, you can rename/move them into a "$comment" keyword. This preserves the old behavior and makes it forward-compatible.)

Specifications sometimes do things like this. HTML has done this a lot. That doesn't mean suddenly there's different "dialects" of HTML.

Unless a dialect is provided, how do you know the intent of the schema author?

I had the impression we made the change understanding that case doesn't (significantly) occur in the wild. And if anything, the new behavior was found in the wild more often (because of misunderstanding).

That IS the exact reason they DO need their own dialect; OpenAPI defines their own custom keywords.

Sure, this is what I imagine as the use case for $schema. I think there's a strong case for "$schema".

In fact, I'll walk back what I said a little bit: If they want to define a "dialect" that is a strict subset of JSON Schema, that would be another good case for $schema. Then you can verify you're conforming to that subset. I think this is analogous declaring the "HTML 4.01 Strict" DTD, it's not changing the semantics of HTML, it's just verifying you're not using deprecated behavior (e.g. frames, marquee, blink).

I'm very strongly of the opinion that a "living standard" would be a huge mistake for JSON Schema.

Fully agreed!

I'm not citing HTML for this reason. It's just that, as the biggest hypermedia format, it's easy to cite for examples. (I don't even like that name... as opposed to what, a "dead standard?" It sounds like a snub).

@Relequestual
Copy link
Member

A dead standard is how WHATWG refer to the IETF RFC approach. Sure sounds like a snub.

@awwright
Copy link
Member

Also, if you want to sniff "$schema" as a hint for which behavior to use, I don't think that's wrong‚—that's perfectly fin, maybe sometimes necessary. But that doesn't mean that the behavior has to be "implementation defined". Acknowledging that some implementations don't support all the latest features is not the same thing as saying that the implemented version of the spec is implementation defined. "Implementation defined" means something more specific than that.

@Relequestual
Copy link
Member

In this case, if you had a draft-07 document with ignored keywords adjacent to a $ref keyword, you can rename/move them into a "$comment" keyword. This preserves the old behavior and makes it forward-compatible.

Unequivocally no.

You've assumed only annotation keywords adjacent to $ref.
I feel you've made many assumptions here.

Part of the work we are looking to do allows us to evaluate implementations against the test suite by ourselves, and report that to users.

Any implementation which doesn't pass all the required tests will be "not recommended".

We actually recently refused to update the support listing of an implementation for deliberately going against the specified required behaviour.

Having proper interoperability is just so critical to JSON Schema. I can't see anyone being able to convince me otherwise.

@Relequestual
Copy link
Member

It's one of those situations I described where it didn't really break anyone in the wild...

I consider that line of justification having no weight here.

We have a technical specification. It clearly defines requirements. We even have a test suite.

People should be able to rely on conformance to the spec. If they cannot, it's useless.

@awwright
Copy link
Member

Unequivocally no. You've assumed only annotation keywords adjacent to $ref. I feel you've made many assumptions here.

I'm not sure what you mean, I think my example was pretty solid. If you had a draft-04 schema such as {"$ref": "foo", "type":"string"}, the "type" keyword would have been ignored. To make it forward-compatible, you could rework the document to {"$ref": "foo", "$comment":{"type":"string"}}. This would preserve the intended behavior for future implementations while not changing the behavior with existing ones. The reason I greenlit this was because it would be a cheap fix, if it even broke anyone at all.

I never understood "$schema" to be how we differentiated between these behaviors. For example, what if you write your own meta-schema? Which $ref behavior gets used? If $schema decides this question, the answer is unclear. If you understand that newer drafts replace older drafts, the answer is clear.

People should be able to rely on conformance to the spec. If they cannot, it's useless.

Ok, but the spec right now seems to say that the behavior of a schema without a "$schema" keyword is undefined (i.e. implementation-defined). This is much worse than changing a behavior where almost nobody noticed.

@awwright
Copy link
Member

I rewrote the OP for json-schema-org/community#189, hopefully that is much clearer and lays out some of the problems that have to be solved.

@karenetheridge
Copy link
Member

karenetheridge commented Apr 5, 2022

My own implementation passes every test in every draft (except a few of the newer features in 2019-09), yet it doesn't read "$schema" at all. Do you have any counter-examples? Are there any tests that flatly contradict themselves between drafts?

In draft2020-12: "The value of "items" MUST be a valid JSON Schema. " But not so in earlier drafts -- an array was also valid (and indeed there are tests of this variant). So if your implementation accepts an array at the items keyword, it is acting in contradiction to the spec. (There is no test for this because at present there are no tests for invalid schemas.)

Another behaviour that is more limited than it used to be is the syntax of $id: a string like "#/foo/bar" was a valid $id in draft7, but later drafts prohibit that.

draft2019-09 also removed the dependencies and definitions keywords. Implementations conforming to a newer version should ignore those keywords, not evaluate it. And $recursiveRef, $recursiveAnchor and additionalItems were removed in draft2020-12.

@karenetheridge
Copy link
Member

I never understood "$schema" to be how we differentiated between these behaviors. For example, what if you write your own meta-schema? Which $ref behavior gets used?

The URI at $schema is first treated as a schema (either loaded from web/disk, if you support that, or it must previously be "known" to the evaluator). The document at that location must also have a $schema keyword. And so on, recursively, until and we must eventually get to a draft metaschema URI. That's where we determine what version of the spec we're using.

This is how the $schema keyword can be forward-compatible: we can define its meaning now, and promise that it won't be materially changed. We can release draft2022-04 tomorrow and evaluators that only support up to draft2020-12 will be able to properly give errors when it sees schemas containing {"$schema":"https://json-schema.org/draft/2022-04/schema"} (load that document, if we support that, and then discover that it references itself, therefore it must be a new unknown draft specification metaschema.)

So, if you want to support your own metaschema that's not based on any existing spec version, you need to build that logic into your evaluator, that is triggered when that schema URI is seen.

@awwright
Copy link
Member

awwright commented Apr 5, 2022

But not so in earlier drafts -- an array was also valid

But supporting the schema form is not mutually exclusive with supporting the array form. Is there an example where supporting a current feature precludes supporting an older one?

The example of properties adjacent to $ref was given (previously they were ignored, now they're evaluated)— but my issues with this example are (1) "$ref" is a core keyword, its behavior doesn't change with respect to "$schema", and (2) nobody was using keywords next to $ref and expecting them to not do anything. We changed it because people were using them and they were expecting it to do something.

Implementations conforming to a newer version should ignore those keywords, not evaluate it.

No, it is perfectly legal to support features that have since been removed from the spec. This is how we support reverse compatibility with older drafts. We don't redefine behavior that's been specified in older drafts, it only becomes undefined. For example, we would not be able to redefine the array form of "items" with a different meaning. In any event, "definitions" was always technically ignored (it was merely reserved).

The document at that location must also have a $schema keyword.

This isn't specified; I thought this was what $vocabulary was for. A meta-schema can provide its own URI as its schema, or omit it entirely. For example, hyper-schema references itself.

@karenetheridge
Copy link
Member

Is there an example where supporting a current feature precludes supporting an older one?

I gave you several -- are you not counting where earlier versions of the spec supported something and now it doesn't, even if it now says "MUST NOT"? That's a bit disingenuous don't you think? Are you suggesting that it is never possible to remove a keyword entirely, because there isn't a conflict if an implementation chooses to keep implementing it anyway? And some implementations continuing to support old keywords while others don't will lead to unpredictable behaviour in the wild, which is exactly what we're trying to prevent by having a specification document.

"$ref" is a core keyword, its behavior doesn't change with respect to "$schema"

That's nothing to do with it being a core keyword. We've changed core keyword behaviours several times, and may do so again in the future. All we really need to lock down between versions is $id and $schema, and everything else flows from that.

No, it is perfectly legal to support features that have since been removed from the spec.

Maybe, if the schema doesn't explicitly contain a $schema keyword, but strictly speaking no, you should pick one version of the spec (preferably the latest, unless the user specifically overrides that), and stick to it.

In any event, "definitions" was always technically ignored

It wasn't, because schemas underneath definitions (and now $defs) needed to be examined for identifiers ($id and $anchor keywords) that other schemas elsewhere might want to reference.

A meta-schema can provide its own URI as its schema

If it does, it better be one of the official draft specification metaschemas, or some other metaschema that is "known" to the implementation as the root source of keyword behaviours. Hyperschema is one - it has its own specification document, and cannot be implemented solely in terms of JSON Schema draft on its own.

I suppose a metaschema could omit the $schema keyword entirely; the spec really should cover how this is handled. I would assume to default to the same behaviour as when a non-meta schema omits the keyword - i.e. generally default to the latest supported draft version. But my point is that schemas referencing each other as metaschemas is a recursive process that has to end at an official spec metaspec eventually.

@awwright
Copy link
Member

awwright commented Apr 5, 2022

are you not counting where earlier versions of the spec supported something and now it doesn't, even if it now says "MUST NOT"?

The use of "MUST NOT" in the "$id" definition is a limitation on what authors & implementations may produce; this doesn't imply that validators must produce an error if they encounter the prohibited form. They can still support the older behavior, because it's not ambiguous.

As far as I know, none of your other examples mention MUST NOT.

Are you suggesting that it is never possible to remove a keyword entirely, because there isn't a conflict if an implementation chooses to keep implementing it anyway? And some implementations continuing to support old keywords while others don't will lead to unpredictable behaviour in the wild, which is exactly what we're trying to prevent by having a specification document.

Implementations can continue to support functionality removed from the spec, as a matter of not breaking reverse compatibility in their implementation.

This is not to imply that old behavior is cross-platform, of course. If we want to support deprecated functionality in a cross-platform way, we would have to write it into the spec.

We've changed core keyword behaviours several times, and may do so again in the future.

Right, but there's nothing in the specification that says the behavior changes based on the $schema keyword.

Maybe, if the schema doesn't explicitly contain a $schema keyword, but strictly speaking no, you should pick one version of the spec (preferably the latest, unless the user specifically overrides that), and stick to it.

This is the basis for my issue json-schema-org/community#189. Your suggestion seems reasonable, but the spec currently suggests that validators encountering schemas without a "$schema" keyword may define any arbitrary behavior, including breaking behavior; and this sounds unreasonable to me.

It wasn't, because schemas underneath definitions (and now $defs) needed to be examined for identifiers

Agreed, this is an important clarification.

If it does, it better be one of the official draft specification metaschemas

This makes sense, but this behavior isn't specified in the spec.

@awwright
Copy link
Member

awwright commented Apr 6, 2022

Are you suggesting that it is never possible to remove a keyword entirely

Also, if by "entirely" you mean we have to pay attention to behavior long since removed—yes. This is generally the case. For example, HTTP is probably never going to redefine behavior for 402, 419, or 420.

@jdesrosiers jdesrosiers changed the base branch from draft-next to main July 8, 2022 15:30
@jdesrosiers
Copy link
Member

The draft-next branch has been merged and is now closed. The merge target for this PR has been changed to main. Here are the recommended steps to get your branch reabsed properly.

  1. Make sure your remote for the json-schema-org/json-schema-spec repo is up-to-date. (Example: git fetch upstream).
  2. Rebase your commits onto main. (Example: git rebase --onto upstream/main abcd123~1 (replace abcd123 with the commit hash of the first commit in your PR)).
  3. Force push the rebased branch to your fork. (Example: git push --force origin my-branch).

@handrews
Copy link
Contributor

@ioggstream could you please re-submit your changes as a new branch from main if you are still interested in having them reviewed and adopted? Between the removal of draft-next and the very long tangential digression of the comments discussion (which is better covered in #1242 and should not be continued here), the PR in this form is too confusing to finalize.

I would recommend splitting the added xref (which seems straightforward as a PR) from the debate over what is or is not a data format, which could probably use clarification through discussion in an issue since there did not seem to be a consensus here.

@handrews handrews closed this Aug 14, 2022
@ioggstream
Copy link
Contributor Author

Hi, I think I could do it for September;) adding a note for that.

@ioggstream
Copy link
Contributor Author

@handrews since you only wanted the editorial parts, it was easy (see #1273 #1274 ). CBOR defines itself as a data format, so I think we can trust it :) https://cbor.io/

@ioggstream ioggstream deleted the patch-1 branch August 14, 2022 21:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants