-
-
Notifications
You must be signed in to change notification settings - Fork 7
Blog entry about defects found in public JSON schemas. #40
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every one of your examples are illustrations of typographical errors, specifically, keywords that are ineffectual because a closing brace was misplaced. This can hardly be attributed to the spec. This happens when writing JSON in general. It happens even more with YAML.
The analysis presented in this paper makes a false assumption about JSON Schema: that it's intended for data modelling. As such many of its conclusions are incorrect.
JSON Schema is a collection of constraints. Keywords are independent because it allows them to be combined however the user needs. If they want schemas that represent data modelling, that's possible, but then they need to understand how JSON Schema works in order to include the proper constraints that model that data.
Having a multitude of keywords enables users to isolate the behavior they want. Moreover, the vocabulary system allows them to create their own keywords and dialects in order to make JSON Schema into whatever they need. JSON Schema's flexibility allows people to have control over what they want to validate.
There is nothing inherent about JSON Schema that is causing authors to write bad schemas. Developers write bad code all of the time. C++ isn't inherently flawed because developers write code that manages memory poorly. That's just bad code. C++ simply offers more control over memory management. Sometimes you need that level of control.
The spec isn't at fault. What's lacking is proper tooling (and perhaps documentation) to help guide schema authors toward writing better schemas.
photo: /img/avatars/claire.jpg | ||
link: https://www.linkedin.com/in/claire-medrala/ | ||
byline: Research Engineer | ||
excerpt: Evidences show that schemas are hard to write, and suggest changes in the spec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I absolutely object to putting (or hinting at) third party recommendations for spec changes in our blog.
This is the JSON Schema blog. It is a place for us to show off what it can do, not a forum for discussion about change or shortcomings. The appropriate place for that is issues and discussions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I absolutely object to putting (or hinting at) third party recommendations for spec changes in our blog.
Ok. The blog is only for your recommendations and discussions. Fine, it is your blog after all. As an academic, we are more used to open discussions and disagreements.
This is the JSON Schema blog. It is a place for us to show off what it can do, not a forum for discussion about change or shortcomings. The appropriate place for that is issues and discussions.
Ok.
These findings suggest key changes in JSON Schema specification which would block most | ||
of encountered defects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with the conclusion that because people aren't using JSON Schema correctly (in many cases, they're typographical errors) that JSON Schema is at fault and needs to change.
Yes and no. Misplacing a keyword leads the system to silently ignore the ineffectual keyword. It is a choice of the language semantics. Different choices could lead to your schema is invalid in many (but not all) cases.
I'm unclear on where you see this assumption. Our study does not assume a particular use case, whether data modelling or something else, because we do not really have any the relevant information regarding this! We just look at existing schemata, without knowing why/for what they were developed, and look for factual errors.
A lot of the allowed combinations do not make much sense. We did not found significant cases where it was an requirement to have such a freedom.
{
"type": "object",
"minLength": 10,
"pattern": "^[0-9]*[a-z]*$",
"maxItems": 42,
"minSize": 17
} Why allowing the above non sense?
Trivial errors are silently ignored because of the chosen semantics, so the user is likely never
A lot of errors are filtered out by a C++ Compiler, because of type checks, mandatory declarations, and so on.
The evidence we gathered demonstrates that (1) people get it wrong quite often (>60%) and (2) some spec changes would improve this situation (we tested our proposals). AFAICS both of these points are facts. I understand that the spec will be broken again on the next release, so you seem to also believe that it can be improved and that it is worth breaking compatibility. At last a point of agreement!
There are hundreds of existing tools, but the right one is still missing? There is a suggestion that a linter would help. Sure, we implicitely wrote one to detect the various errors reported in the paper. Now, if a linter somehow restrict the language by filtering issues, then why not try to put at least some of these restrictions in the language itself, so that all conformant tool would check them? |
The assumption is present in the "invalid" cases you present. You're assuming that a schema has to align with data patterns in programming languages. You're assuming validation (which arguably is the primary purpose of JSON Schema). But there are many use cases, most of which we still don't know about. For example, code generation. Many languages support union types. If I want to generate a union type, I might combine keywords that don't otherwise make sense.
It may be nonsense to us, but we can't guarantee that some user actually has a purpose for something like that. JSON Schema is intentionally permissive in order to account for as many use cases as possible. Yes, many schemas appear to serve no purpose or contain ineffectual keywords, however it's impossible for us to rule out the possibility that some user has a real use for such a schema. This is where a linter comes in. A linter will warn the user that a specific construct doesn't generally make sense, but the user still has the option to ignore the warning and do it anyway. If JSON Schema disallows such things, then the user no longer has that choice, and we've prevented them from doing what they want to do. The point is to allow users to find new use cases without restriction. The solution to helping these users that you found is targeted tooling. Yes, some such tooling already exists, and we've partially built some. However what's there is not well-integrated into the common IDEs and editors, so they're not typically used.
Yes. The vast majority of the "hundreds of existing tools" are validators, and many of them don't support even the latest version of the spec, which is almost three years old. Beyond validators, there are various generators and a few other targeted/single-purpose tools. Few of them are editors, and we are actively working with those to help them improve there offerings. |
I also think the data sources have something to be desired.
I'm surprised there's no mention of OpenAPI or AsyncAPI, arguably the largest usage points of JSON Schema. I wouldn't be surprised if more people used JSON Schema indirectly through one of these specs than they do directly. There's still a lot of good work done with this study, and it would be useful for creating linting tools. I just don't agree with some of the conclusions. But the biggest thing for me, though, is that I can't back putting third party spec change recommendations and advertising potential competitors or alternative proposals in our blog. |
We'd like to thank you for contributing with this blog post proposal. We recognize the big effort behind the study backing the blog, and we are sure we can extract great insights from it, however this content differs from the Community driven content we expected. We'd like to learn from your work and be able to discuss about your conclusions, but most important make sure we serve the JSON Schema Community the best way. This is why we'd like to invite you to move the discussion to this Community discussion and continue there a constructive conversation to take the most from this opportunity. We'd like to acknowledge once again for this contribution. This situation inspired the community to work on publishing the blog guidelines and make this experience better in the future. Please @zx80, join us in this discussion. |
Users have a hard time remembering the 60 keywords and writing schemas. | ||
We think that this can be significantly improved with limited changes to | ||
the spec. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They could also be found with a linter mode, which has been proposed here - https://github.com/orgs/json-schema-org/discussions/323 and json-schema-org/json-schema-spec#1079
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pointers.
This add a 3 minites blog entry, a cover image from Unsplash and two small avatars.