Skip to content

Update "format" and "content*" for new JSON Schema #2200

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 28, 2020
Prev Previous commit
Next Next commit
"format" is an annotation
  • Loading branch information
handrews committed May 15, 2020
commit bee097486732ad6ddc2d72035935c6680d803e05
2 changes: 2 additions & 0 deletions versions/3.1.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,8 @@ However, to support documentation needs, the `format` property is an open `strin
Additional formats MAY be used even though undefined by either JSON Schema or this specification.
Types that are not accompanied by a `format` property follow the type definition in the JSON Schema. Tools that do not recognize a specific `format` MAY default back to the `type` alone, as if the `format` is not specified.

Note that by default, JSON Schema validators MUST NOT attempt to validate the `format` keyword. It is primarily treated as an annotation that informs applications of the intended nature of the data. For historical reasons, it is possible to configure JSON Schema validators to validate "format", but this is not consistently implemented and SHOULD NOT be relied upon in any environment where an unknown validator or unknown validator configuration might be used.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the company I work for we use UUIDs to identify resources so type: string format: uuid is super common in our schemas. And we do leverage that information for a number of things.
It seems that this note is motivated by the current implementations in the wild, which does not seem like a good reason to include it to a standard. If these validators start improving would this note go away? At what point? How many good validators out there would it make it happen?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcarres-mdsol uuid and all other formats that are no longer explicitly listed are now part of JSON Schema proper. See Section 7.3.5 "Resource Identifiers" under the format keyword's specification for uuid and also how it relates to URNs of the form urn:uuid:.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcarres-mdsol As for validators actually validating format, no, that's not going to happen. I don't mean that in a "we (the JSON Schema project) decided it shouldn't happen", I mean it in "this has been a problem for years, and after talking with numerous maintainers of major implementations, it is 100% clear that format is a confusing mess that will not be validated consistently no matter how much the spec says it should be." At some point you have to bow to reality and figure out a way forward, and saying that by default implementations MUST NOT attempt to validate using the format keyword was the only possible way to produce consistent default behavior. This is explained in detail in the JSON Schema draft's release notes.

The problem with format is that it tries to do a huge number of things, and be open-ended to any extension anyone wants to throw in there. There's not even agreement on what fully validating format would involve. My hope is that with re-usable extension vocabularies being a thing, people will design vocabularies with keywords that validate specific things, instead of shoving them into this all-purpose-but-never-reliable keyword.

So an isUuid keyword would be straightforward to implement b/c UUIDs are well-specified. For something like email, where there's no consensus on what "valid email" means, people could create keywords with specific criteria. isProbablyEmail might just check for an @ in the string (this is what several validators actually do right now). isEmailByRegex might specify validation against a specific regex (instead of what happens now, which is that each validator comes up with its own incredibly complex regex that they probably got from a web site somewhere). isEmailBySomeLibraryName would define what library to use to validate, etc.

These keywords would then be supported or not, which could be detected/enforced using the $vocabulary mechanism, and would reliably provide whatever level of validation they specify. You could require support for isUuid but not require any sort of email validator, for example- that would be likely to be supported. As would keywords for RFC 3339 date and time values, which are likewise well-specified.

Ultimately, I hope to deprecate format because it's always been a compatibility nightmare. But that will only happen if enough replacement vocabularies are developed and adopted widely enough to be a viable migration path.

Copy link

@jcarres-mdsol jcarres-mdsol Apr 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed response. Hopefully the plan will go as expected. I'm worried about the complexity of all this, I wonder if people will shy away from vocabularies and just keep shoveling everything into format anyways. But time will tell.
Thanks for your efforts!

Copy link
Member Author

@handrews handrews Apr 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcarres-mdsol we're definitely starting to see people understand the utility of the vocabulary concept on our slack. But really, if enough people accept vocabularies as a solution here, we'll deprecate/remove format. If they don't, we won't. [EDIT: although we may look for other ways to do so]


The formats defined by the OAS are:

[`type`](#dataTypes) | [`format`](#dataTypeFormat) | Comments
Expand Down