-
-
Notifications
You must be signed in to change notification settings - Fork 315
Split $anchor out of $id, define canonical vs shadowed URI behavior #780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm glad you were able to get this in!
I found a few errors in the example where fragments were incorrectly labeled as "undefined behavior".
Does the |
@gregsdennis You can no longer write But I agree that that's not entirely obvious from the current PR. I will add something to clarify the change. It might be a CREF rather than main text, and it will definitely be in the changelog... which I forgot to write. Lemme add that... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm removing my request for changes, because it was a misunderstanding. I'm sure the misunderstanding was due more to my preconceived notions of the concept getting in the way than it was about failure to explain things well. Others will have to decide if it's explained clearly.
@jdesrosiers thanks- you spotted a legitimately confusing case, and that is very helpful! I'm honestly not sure what the right thing to do is with the fragment URI that expresses the location of the embedded schema resource in the parent schema resource. It does not seem right to say that it is undefined, so let's see how this wording holds up and whether I need to tweak it further. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think this makes sense.
I've made a few comments which are semantics only.
In terms of killing URI shadowing (Resolving #726), could you clarify for me, given the example, a reference which would be invalid? I was trying to think of one, but I can't quite wrangle my head around this...
Additionally, do we want to explicitly "NOT RECOMMEND[ED]" using the non-canonical URI given support is optional (If I read this correctly)
This lets us talk more clearly about what an "$id" really is, as will be seen in the next commit. It also introduces the idea of a schema resource having a canonical URI, which is important for explaining why JSON Pointer fragments relative to parent base URIs (meaning that they "cross" a subschema "$id") should have undefined behavior.
This implements the recent decision to strongly discourage using fragments that cross a base URI change (e.g. an "$id" appearance). This is done by describing schemas identified by absolute URIs as resources (per the literal definition of Univeral Resource Identifier), which was done in the previous commit, and declaring the URI resulting from the "$id" to be the resource's canonical URI. While URIs for subschemas with "$id" and their chidren can be constructed using the base URI from a parent schema, these URIs are non-canonical, and their behavior is undefined. This is on the grounds that switching between embedding and referencing schema resources should behave essentially identically. A difficulty with how annotation collection works in the event of such as switch is noted in a CREF.
This adds the $anchor keyword in place of the fragment-only form of $id, and updates the schema identification examples for $anchor and canonical vs non-canonical URIs.
$id cannot contain a fragment except for an empty one, mostly because a lot of people do that out of habit it seems. This also reworks the wording around non-canonical URIs to avoid an overly broad "undefined behavior." This still doesn't feel ideal but is hopefully an improvement.
Now that the core spec explicitly defines this as the default, let's call it that. Because calling it the "core and validation" meta-schema is annoying.
Since we're now very strongly discouraging fragments in $id, let's not use them in $schema either. It works either way, but I like the consistency. Stylistically, referring to "#" internally makes sense, while using an absolute-URI per RFC 3986 (no fragment) makes sense externally.
The update I just pushed is simply a rebase onto master to fix the conflicts from merging the other PRs. The only conflict was the There shouldn't be any significant change to the diff, as I made a point to keep the PRs as non-overlapping as possible. |
@gregsdennis I answered the one question you have- are you OK with this change now or have you not had time to finish reviewing? @johandorland @Julian @KayEss are any of you planning to weigh in? If so, can you give an estimated date of when you might get to it? If not I'll just wait for at least @Relequestual and @awwright (and @philsturgeon if he ever gets back to a laptop from his bike tour 🚴 ). This is the last big mandatory thing, and is a prerequisite for the last optional proposal I'm still working on, and we'd really like to get to publishing, so... reviews are needed! I know it's summer and folks are busy and on vacation and stuff, so letting me know when you might get to it would be very helpful. |
Thumbs up from me! Thanks as usual @handrews (both for a good change and for managing to catch my attention :/) |
My question is answered |
[EDIT: never mind, instead of a new PR I split out a smaller part and added it to this PR as another commit- here is the message] This further clarifies the schema resource concept and This also relaxes a restriction on "$schema" that has no functional Additional best practices for embedding will follow in a separate |
This further clarifies the schema resource concept and how it is used. It provides an example to simplify the explanation of how non-canonical URIs can be problematic. This also relaxes a restriction on "$schema" that has no functional impact but avoids requiring implementations to detect and handle a rather complex case (embedding schema resources with different "$schema" values). And also means that embedding can work without having to change the embedded value by removing the "$schema" keyword. Additional best practices for embedding will follow in a separate commit.
I'll plan to review changes this week. |
My suggestion of explicitly suggesting schema authors avoid non-canonical URIs still stands. |
Co-Authored-By: Ben Hutton <[email protected]>
We use "interoperability" for this throughout the spec.
Co-Authored-By: Ben Hutton <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bop
Looks like it's good to go! 🚀
Our naming of "$anchor" here is a little bit unfortunate, in the rest of the Web it's backwards: "id" is a plain label (like a CNAME) and "anchor" is a URI. (edit) Except apparently in RFC XML. |
By and large this seems like it'll be more accessible to Web developers than it was before; but the biggest hurdle I have is figuring out the significance of a "canonical URI". Why not define a rel=self link, instead of a rel=canonical relationship? |
@awwright regarding our Also, if we tried to change I'm open to suggestion, including leaving Would |
Can you elaborate on what that would look like? I found the "canonical" language helpful as the specific issue is understanding that, out of multiple possibly usable URIs, this is the one that should be used and the only one that can really be relied on. Do you think that discussing this as a "self" relation instead is more accurate? And (at least as important), do you think that more people will find discussion in terms of "self" intuitive? |
I'll sit down later and compare the two link relations, but off the top of my head, rel=self is more accurate. My understanding of rel=canonical is more like "these are technically different resources, but the preferred way to access it is via that one over there". Mostly for search engine optimization, so you can tell that two different pages actually contain the same blog article. |
@awwright thanks, I'll also take a look at the wording and see if using "self" makes sense to me. I do follow your reasoning here. |
To follow up on a discussion with @awwright on slack, this was resolved in favor of keeping "canonical" through the following reasoning: After a discussion of how "self" and "canonical" might differ (a resource accessed through URI X might give X as its "self" but Y as its "canonical", which seems to be a thing that happens with web pages at least based on SEO guidance), @awwright asked:
To which my response was yes, that was ultimate the decision from issue #726: @awwright I'm taking your "I suppose that makes sense" from yesterday as approval since you didn't follow up with any further question or response. If that is an error, please let me know and I can make a new PR to address any remaining concerns. |
Also, while I agree that our naming of |
[EDIT: I pushed a commit Saturday just before 1PM california time that I think simplifies things considerably- see the commit log message for details.]
THERE ARE THREE COMMITS (originally, and then more added)
While this is all one coherent change, and the commits don't fully stand alone, I have organized them to make reviewing a bit easier.
$anchor
(Make$id
conform to RFC 3986 suggestion for base URI elements #729) and update the examples$id
Fixes #729. Fixes #726.
This implements the recent decision to strongly discourage using fragments that cross a base URI change (e.g. an
"$id"
appearance).This is done by describing schemas identified by absolute URIs as resources (per the literal definition of Univeral Resource Identifier), and declaring the URI resulting from the
"$id"
to be the resource's canonical URI.While URIs for subschemas with
"$id"
and their children can be constructed using the base URI from a parent schema, these URIs are non-canonical, and their behavior is undefined. This is on the grounds that switching between embedding and referencing schema resources should behave essentially identically.A difficulty with how annotation collection works in the event of such as switch is noted in a CREF (related to #779)
This also adds the $anchor keyword in place of the fragment-only form of
$id
, and updates the schema identification examples for$anchor
and canonical vs non-canonical URIs.