Split $anchor out of $id, define canonical vs shadowed URI behavior #780

handrews · 2019-08-16T01:23:19Z

[EDIT: I pushed a commit Saturday just before 1PM california time that I think simplifies things considerably- see the commit log message for details.]

THERE ARE THREE COMMITS (originally, and then more added)
While this is all one coherent change, and the commits don't fully stand alone, I have organized them to make reviewing a bit easier.

First I define a schema resource
Then I deal with URI shadowing ("$id": Eliminate base URI shadowing #726) somewhat obsoleted by the 4th commit
Then I deal with $anchor (Make $id conform to RFC 3986 suggestion for base URI elements #729) and update the examples
Feedback on URI shadowing and fragments in $id
Update meta-schemas
Fix more examples
Add changelog

Fixes #729. Fixes #726.

This implements the recent decision to strongly discourage using fragments that cross a base URI change (e.g. an "$id" appearance).

This is done by describing schemas identified by absolute URIs as resources (per the literal definition of Univeral Resource Identifier), and declaring the URI resulting from the "$id" to be the resource's canonical URI.

While URIs for subschemas with "$id" and their children can be constructed using the base URI from a parent schema, these URIs are non-canonical, and their behavior is undefined. This is on the grounds that switching between embedding and referencing schema resources should behave essentially identically.

A difficulty with how annotation collection works in the event of such as switch is noted in a CREF (related to #779)

This also adds the $anchor keyword in place of the fragment-only form of $id, and updates the schema identification examples for $anchor and canonical vs non-canonical URIs.

jdesrosiers

I'm glad you were able to get this in!

I found a few errors in the example where fragments were incorrectly labeled as "undefined behavior".

jsonschema-core.xml

gregsdennis · 2019-08-16T03:46:48Z

Does the $anchor functionality differ from that of $id when $id's value is "anchor-like"? (i.e Is this existing functionality just split out to another keyword?) If so, does $id no longer support what $anchor now does?

handrews · 2019-08-16T04:59:01Z

@gregsdennis "$anchor": "foo" has the same effect that "$id": "#foo" used to have.

You can no longer write "$id": "#foo", as the spec says that the value of $id MUST resolve to an absolute-URI (without a fragment).

But I agree that that's not entirely obvious from the current PR. I will add something to clarify the change. It might be a CREF rather than main text, and it will definitely be in the changelog... which I forgot to write. Lemme add that...

jdesrosiers

I'm removing my request for changes, because it was a misunderstanding. I'm sure the misunderstanding was due more to my preconceived notions of the concept getting in the way than it was about failure to explain things well. Others will have to decide if it's explained clearly.

handrews · 2019-08-16T16:26:48Z

@jdesrosiers thanks- you spotted a legitimately confusing case, and that is very helpful! I'm honestly not sure what the right thing to do is with the fragment URI that expresses the location of the embedded schema resource in the parent schema resource. It does not seem right to say that it is undefined, so let's see how this wording holds up and whether I need to tweak it further.

jsonschema-core.xml

Relequestual

Overall I think this makes sense.

I've made a few comments which are semantics only.

In terms of killing URI shadowing (Resolving #726), could you clarify for me, given the example, a reference which would be invalid? I was trying to think of one, but I can't quite wrangle my head around this...

Additionally, do we want to explicitly "NOT RECOMMEND[ED]" using the non-canonical URI given support is optional (If I read this correctly)

jsonschema-core.xml

This lets us talk more clearly about what an "$id" really is, as will be seen in the next commit. It also introduces the idea of a schema resource having a canonical URI, which is important for explaining why JSON Pointer fragments relative to parent base URIs (meaning that they "cross" a subschema "$id") should have undefined behavior.

This implements the recent decision to strongly discourage using fragments that cross a base URI change (e.g. an "$id" appearance). This is done by describing schemas identified by absolute URIs as resources (per the literal definition of Univeral Resource Identifier), which was done in the previous commit, and declaring the URI resulting from the "$id" to be the resource's canonical URI. While URIs for subschemas with "$id" and their chidren can be constructed using the base URI from a parent schema, these URIs are non-canonical, and their behavior is undefined. This is on the grounds that switching between embedding and referencing schema resources should behave essentially identically. A difficulty with how annotation collection works in the event of such as switch is noted in a CREF.

This adds the $anchor keyword in place of the fragment-only form of $id, and updates the schema identification examples for $anchor and canonical vs non-canonical URIs.

$id cannot contain a fragment except for an empty one, mostly because a lot of people do that out of habit it seems. This also reworks the wording around non-canonical URIs to avoid an overly broad "undefined behavior." This still doesn't feel ideal but is hopefully an improvement.

Now that the core spec explicitly defines this as the default, let's call it that. Because calling it the "core and validation" meta-schema is annoying.

Since we're now very strongly discouraging fragments in $id, let's not use them in $schema either. It works either way, but I like the consistency. Stylistically, referring to "#" internally makes sense, while using an absolute-URI per RFC 3986 (no fragment) makes sense externally.

handrews · 2019-08-22T01:43:17Z

The update I just pushed is simply a rebase onto master to fix the conflicts from merging the other PRs. The only conflict was the $schema and $id values of the vocab example (they were totally changed in the example PR, and I just had the trailing # in this PR).

There shouldn't be any significant change to the diff, as I made a point to keep the PRs as non-overlapping as possible.

handrews · 2019-08-23T15:11:34Z

@gregsdennis I answered the one question you have- are you OK with this change now or have you not had time to finish reviewing?

@johandorland @Julian @KayEss are any of you planning to weigh in? If so, can you give an estimated date of when you might get to it? If not I'll just wait for at least @Relequestual and @awwright (and @philsturgeon if he ever gets back to a laptop from his bike tour 🚴 ).

This is the last big mandatory thing, and is a prerequisite for the last optional proposal I'm still working on, and we'd really like to get to publishing, so... reviews are needed! I know it's summer and folks are busy and on vacation and stuff, so letting me know when you might get to it would be very helpful.

Julian · 2019-08-23T15:31:23Z

Thumbs up from me! Thanks as usual @handrews (both for a good change and for managing to catch my attention :/)

gregsdennis · 2019-08-24T03:30:56Z

My question is answered

handrews · 2019-08-24T07:37:12Z

[EDIT: never mind, instead of a new PR I split out a smaller part and added it to this PR as another commit- here is the message]

This further clarifies the schema resource concept and
how it is used. It provides an example to simplify
the explanation of how non-canonical URIs can be
problematic.

This also relaxes a restriction on "$schema" that has no functional
impact but avoids requiring implementations to detect and handle
a rather complex case (embedding schema resources with different
"$schema" values). And also means that embedding can work without
having to change the embedded value by removing the "$schema"
keyword.

Additional best practices for embedding will follow in a separate
commit.

This further clarifies the schema resource concept and how it is used. It provides an example to simplify the explanation of how non-canonical URIs can be problematic. This also relaxes a restriction on "$schema" that has no functional impact but avoids requiring implementations to detect and handle a rather complex case (embedding schema resources with different "$schema" values). And also means that embedding can work without having to change the embedded value by removing the "$schema" keyword. Additional best practices for embedding will follow in a separate commit.

Relequestual · 2019-08-27T08:25:40Z

I'll plan to review changes this week.

jsonschema-core.xml

Relequestual · 2019-09-03T08:31:42Z

My suggestion of explicitly suggesting schema authors avoid non-canonical URIs still stands.
Edit, I worked out how to add suggestions as part of doing a review!

jsonschema-core.xml

Co-Authored-By: Ben Hutton <[email protected]>

We use "interoperability" for this throughout the spec.

jsonschema-core.xml

Co-Authored-By: Ben Hutton <[email protected]>

Relequestual

bop
Looks like it's good to go! 🚀

awwright · 2019-09-10T07:12:44Z

Our naming of "$anchor" here is a little bit unfortunate, in the rest of the Web it's backwards: "id" is a plain label (like a CNAME) and "anchor" is a URI.

(edit) Except apparently in RFC XML.

awwright · 2019-09-10T07:20:12Z

By and large this seems like it'll be more accessible to Web developers than it was before; but the biggest hurdle I have is figuring out the significance of a "canonical URI".

Why not define a rel=self link, instead of a rel=canonical relationship?

handrews · 2019-09-10T16:49:13Z

@awwright regarding our $anchor vs HTML's id (and I guess XML's? I've never looked): Yeah, that's unfortunate, and I noticed that. I'm not sure what to do about it. This part of our $id was never as widely used so it feels safer to change.

Also, if we tried to change "$id": "#foo" to "$id": "foo" (note the change from URI-reference to plain label) while changing "$id": "https://example.com/bar" to "$somethingElse": "https://example.com/bar" that would seem to be even more confusing, as no current use of $id would be left unchanged.

I'm open to suggestion, including leaving $id the way this PR leaves it but using a different name for $anchor, but I couldn't come up with anything better than this. At least $anchor is reasonably accurate in the sense that in older versions of HTML the idiom <a name="bar"> was referred to as an anchor. But I did not want to use $name as that seemed bad for some other reason that escapes me at the moment.

Would $name be better? And if so, should $recursiveAnchor be changed as well?

handrews · 2019-09-10T16:51:42Z

@awwright

By and large this seems like it'll be more accessible to Web developers than it was before; but the biggest hurdle I have is figuring out the significance of a "canonical URI".

Why not define a rel=self link, instead of a rel=canonical relationship?

Can you elaborate on what that would look like? I found the "canonical" language helpful as the specific issue is understanding that, out of multiple possibly usable URIs, this is the one that should be used and the only one that can really be relied on.

Do you think that discussing this as a "self" relation instead is more accurate? And (at least as important), do you think that more people will find discussion in terms of "self" intuitive?

awwright · 2019-09-10T22:39:09Z

I'll sit down later and compare the two link relations, but off the top of my head, rel=self is more accurate.

My understanding of rel=canonical is more like "these are technically different resources, but the preferred way to access it is via that one over there". Mostly for search engine optimization, so you can tell that two different pages actually contain the same blog article.

handrews · 2019-09-10T23:06:45Z

@awwright thanks, I'll also take a look at the wording and see if using "self" makes sense to me. I do follow your reasoning here.

handrews · 2019-09-14T03:23:20Z

To follow up on a discussion with @awwright on slack, this was resolved in favor of keeping "canonical" through the following reasoning:

After a discussion of how "self" and "canonical" might differ (a resource accessed through URI X might give X as its "self" but Y as its "canonical", which seems to be a thing that happens with web pages at least based on SEO guidance), @awwright asked:

Do we want to convey a “most correct URI”, if any of the URIs can suffice?

To which my response was yes, that was ultimate the decision from issue #726: "$id": Eliminate base URI shadowing

@awwright I'm taking your "I suppose that makes sense" from yesterday as approval since you didn't follow up with any further question or response. If that is an error, please let me know and I can make a new PR to address any remaining concerns.

handrews · 2019-09-14T03:25:12Z

Also, while I agree that our naming of $id and $anchor is somewhat unfortunate, that's come up before and no one has had a better idea then or now. I wish we could figure something out, but we do want to preserve the more common uses of $id.

handrews added Type: Enhancement core labels Aug 16, 2019

handrews added this to the draft-08 milestone Aug 16, 2019

handrews requested a review from a team August 16, 2019 01:23

handrews mentioned this pull request Aug 16, 2019

Allow "$ref" to take a schema, to make inlining easier? #779

Closed

jdesrosiers requested changes Aug 16, 2019

View reviewed changes

jsonschema-core.xml Outdated Show resolved Hide resolved

jsonschema-core.xml Outdated Show resolved Hide resolved

jsonschema-core.xml Outdated Show resolved Hide resolved

jdesrosiers reviewed Aug 16, 2019

View reviewed changes

handrews changed the base branch from canon-base to master August 16, 2019 16:24

Relequestual reviewed Aug 16, 2019

View reviewed changes

jsonschema-core.xml Show resolved Hide resolved

Relequestual requested changes Aug 19, 2019

View reviewed changes

jsonschema-core.xml Show resolved Hide resolved

jsonschema-core.xml Show resolved Hide resolved

This was referenced Aug 19, 2019

Improve type safety with restricted component references OAI/OpenAPI-Specification#1991

Closed

Clarify the nature of schema placeholder keywords #783

Merged

handrews added 10 commits August 21, 2019 18:39

Split $anchor from $id, update examples

7656e94

This adds the $anchor keyword in place of the fragment-only form of $id, and updates the schema identification examples for $anchor and canonical vs non-canonical URIs.

Add $anchor to meta-schema, update $id

4f110e7

Fix more examples for $anchor and canonical $id

e1e3441

Changelog for $id and $anchor

5ecfaa1

Missed a xref change to $anchor

428be3c

Refer to the "default" meta-schema

472ed6b

Now that the core spec explicitly defines this as the default, let's call it that. Because calling it the "core and validation" meta-schema is annoying.

handrews force-pushed the canon branch from 9a1b826 to 2da139a Compare August 22, 2019 01:41

handrews mentioned this pull request Aug 24, 2019

More consistent concept of schema resources #788

Closed

Relequestual reviewed Sep 2, 2019

View reviewed changes

jsonschema-core.xml Show resolved Hide resolved

Relequestual reviewed Sep 3, 2019

View reviewed changes

jsonschema-core.xml Outdated Show resolved Hide resolved

handrews mentioned this pull request Sep 5, 2019

Move core examples/best practices to appendixes #791

Merged

philsturgeon approved these changes Sep 5, 2019

View reviewed changes

handrews and others added 2 commits September 4, 2019 20:42

Update jsonschema-core.xml

737ee16

Co-Authored-By: Ben Hutton <[email protected]>

"interoperability" rather than "transportability"

ccf4426

We use "interoperability" for this throughout the spec.

Relequestual reviewed Sep 5, 2019

View reviewed changes

jsonschema-core.xml Outdated Show resolved Hide resolved

Update jsonschema-core.xml

4c0d67a

Co-Authored-By: Ben Hutton <[email protected]>

Relequestual approved these changes Sep 5, 2019

View reviewed changes

handrews merged commit 8f47368 into json-schema-org:master Sep 14, 2019

handrews deleted the canon branch April 26, 2020 16:49

Uh oh!

Split $anchor out of $id, define canonical vs shadowed URI behavior #780

Split $anchor out of $id, define canonical vs shadowed URI behavior #780

Uh oh!

Conversation

handrews commented Aug 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jdesrosiers left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gregsdennis commented Aug 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

handrews commented Aug 16, 2019

Uh oh!

jdesrosiers left a comment

Choose a reason for hiding this comment

Uh oh!

handrews commented Aug 16, 2019

Uh oh!

Uh oh!

Relequestual left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

handrews commented Aug 22, 2019

Uh oh!

handrews commented Aug 23, 2019

Uh oh!

Julian commented Aug 23, 2019

Uh oh!

gregsdennis commented Aug 24, 2019

Uh oh!

handrews commented Aug 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Relequestual commented Aug 27, 2019

Uh oh!

Uh oh!

Relequestual commented Sep 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Relequestual left a comment

Choose a reason for hiding this comment

Uh oh!

awwright commented Sep 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

awwright commented Sep 10, 2019

Uh oh!

handrews commented Sep 10, 2019

Uh oh!

handrews commented Sep 10, 2019

Uh oh!

awwright commented Sep 10, 2019

Uh oh!

handrews commented Sep 10, 2019

Uh oh!

handrews commented Sep 14, 2019

Uh oh!

handrews commented Sep 14, 2019

Uh oh!

Uh oh!

handrews commented Aug 16, 2019 •

edited

Loading

gregsdennis commented Aug 16, 2019 •

edited

Loading

handrews commented Aug 24, 2019 •

edited

Loading

Relequestual commented Sep 3, 2019 •

edited

Loading

awwright commented Sep 10, 2019 •

edited

Loading