Skip to content

Clarify how to combine boolean and numeric "items" annotations #904

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ssilverman opened this issue May 1, 2020 · 8 comments
Closed

Clarify how to combine boolean and numeric "items" annotations #904

ssilverman opened this issue May 1, 2020 · 8 comments

Comments

@ssilverman
Copy link
Member

ssilverman commented May 1, 2020

I'm implementing the "items" annotations and had some questions:

  1. Is it correct to assume that when combining Boolean and numeric "items" annotations, we first
    convert a "true" Boolean annotation value to the size of the array minus one? A "false" annotation value I think would convert to -1.
  2. Why is the annotation the max. index as opposed to the number of items the schema was applied to? A count might be much clearer and avoid the weird potential "-1" interpretation above, for example, when an array-style "items" is applied to an empty array, or an empty array "items" is applied to any array.
  3. Am I correct in reading the annotation value as "max. index of application regardless of success" and not "max. index of successful validation"?

See: 9.3.1.1. items

@ssilverman
Copy link
Member Author

@jdesrosiers I believe, if I'm reading it correctly, that this line in your validator uses a size and not the index: https://github.com/hyperjump-io/json-schema-validator/blob/6eab4c231a954c6df46e57b33933e0b4a1216f08/lib/keywords/items.js#L30

@Relequestual
Copy link
Member

Why do you believe you need to combine the annotations?

In regards to 2, you may have an items array value in your schema which has fewer items than in the array it's being applied to. As such, items may only apply it's array schema values to the first n items in the applied array, and have y number of items in the applied array where no schemas have been applied. Happy to give an example if that helps.

Regarding 3. Consider annotations are only collected for successful validation. If the array schema values of items are applied to the first 2 elements in the applicable array for example, if the firs passes validation, but the second fails, then annotation won't be collected for items.
All must pass validation in order for annotations to be collected, and as such, it will be the max index to which the array schema values were applied, (assuming that all resulted in a positive assertion of vailidity).

Again, happy to provide examples if required.

It's a little confusing at first. I got this wrong and even showed a dud example the last time I gave a JSON Schema talk at an international conference... 😳

@ssilverman
Copy link
Member Author

ssilverman commented May 1, 2020

Your explanation of annotations not being applied if a schema doesn't pass was a good reminder. It makes sense. However, I'm still not understanding why Boolean and number "items" annotations won't be combined, and also why an "items" annotation can't be -1. Some examples follow.

An example illustrating why Boolean and number annotations would have to be combined:

Schema: {
  "allOf": [
    { "items": [ true, true ] },
    { "items": true }
  ]
}

Instance: [ "1", "2", "3" ]

When applying the first "items", an array, the max. index that got checked was "1", and so that's the annotation. Next, the second "items" is applied and it results in a Boolean annotation, "true". We have to combine these somehow. Or, instead of using a Boolean, we could instead use the max. processed index, which is "2". (The spec doesn't require a Boolean annotation here). Combining using the max. function results in "2".

Example showing "items" annotation with "-1":

Schema: { "items": [ ] }
Instance: [ "1", "2", "3" ]

What's the max. index in the instance to which the array schema was applied? It's not zero, so what should it be? That's why I was proposing that the annotation is actually the valid item count instead of the largest index.

@Relequestual
Copy link
Member

The allOf applicator doesn't effect the collected annotations of it's subschema values. The instance has multiple annotations in your example; they are not combined.
The spec says that Annotation keywords define their own rules for combining such values., which is where it may be confusing.

In terms of your example where items is an empty array. This is a no-op.

This keyword produces an annotation value which is the largest index to which this keyword applied a subschema.

It doesn't have any subschemas, so it cannot apply them, and therefore produces no annotation results. Given it's a no-op, the fact it doesn't procude an annotations has no impact on further validation.

@ssilverman
Copy link
Member Author

ssilverman commented May 1, 2020

Yes, you're making sense. :)

But one other thing is maybe I'm misunderstanding how annotations are being collected. I thought that annotations are combined and collected for each element in the instance, independent of where they're from in the schema. i.e. whenever a location in the instance is "touched" by the schema and it's appropriate to add an annotation, an annotation is added and combined with others that may already be there.

In my first example, two "items" schemas are applied to the main instance array (from inside the "allOf"; one is an array of schemas and the other is just a schema). This means that two "items" annotations will be applied to the main instance array. Is this not correct? If correct, don't two annotations that have the same name need to be combined?

As I create a validator to learn how this all works, the following is my current mental model: unique annotations are collected whenever an instance element is "touched". So if a particular instance element is "touched" twice by "items" (from anywhere), those two annotations need to be combined somehow. Is this mental model even correct? The schema spec implies this whenever it mentions "combining annotations".

Update: Quote from the spec:

Annotation results for "items" keywords from multiple schemas applied to the same instance location are combined by setting the combined result to true if any of the values are true, and otherwise retaining the largest numerical value.

Where else would a more "items" schemas get applied from if not from an "allOf" or something?

@jdesrosiers
Copy link
Member

@ssilverman My implementation doesn't collect annotations (not yet), so you don't want to look at my implementation as an example for that stuff.

@handrews
Copy link
Contributor

handrews commented May 1, 2020

Specific answer: items annotations (and see also #864 regarding splitting items) are there to support unevaluatedItems (and additionalItems although most implementations don't bother doing that one in terms of annotations as it's easier to do it without them).

A true value means that all items have been successfully evaluated, and therefore unevaluatedItems does not apply. With true, you don't have to check the length of the instance array to see if there are more items. It's a performance optimization. A "highest index" value means that if there are items beyond that index, those items have not been evaluated, so unevaluatedItems applies to them. You have to check to see if there are any such items.


Long general answer:

Combining annotations during collection is an idea that hasn't made as much sense as it seemed to at first.

The idea was simple enough: for readOnly and similar boolean annotations, what matters is whether there are any with a true value. This occasionally came up as a question: is a property or resource read-only if any true value is present, or only if all schemas that apply state that it is readOnly. There was a pretty solid consensus that any is correct for this sort of thing.

So it seemed like a good idea to put that into the spec, as I had this notion of annotations being rolled up while they were collected and producing some sort of end value. "I sorted out a ton of anyOf branches and ultimately, yes, this thing is read-only."

But that's not actually how any of our recommended output formats work. Instead, those formats provide you (the end user getting the data back, or the code processing a schema object and examining the annotations from subschemas or adjacent keywords ini that object) with the information needed to make that determination. They don't provide a notion of rolled-up value.

So I think we should drop that notion. We can still say that applications SHOULD treat a property as read-only (or whatever) if there is at least one readOnly (or whatever) annotation with a true value, but that would happen at the application level rather than in the annotation collection process.

In the case of unevaluatedItems and items (and prefixItems, in the next draft), you just look at the annotations and do the "combining" as needed. Since unevaluatedItems in particular is probably rarely used, that's arguably simpler and faster.

@handrews
Copy link
Contributor

handrews commented May 1, 2020

See also #906

I'm going to close this because I think everything has been addressed or belongs with #906.

Regarding "successful evaluation", with items it either all succeeds or the whole keyword produces a failure which causes annotations to be dropped, so the question is moot. But see also #810 for this question regarding contains, where it is not moot, and I can't remember where we are in that discussion because it makes my head hurt 😛

@handrews handrews closed this as completed May 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants