-
Notifications
You must be signed in to change notification settings - Fork 78
Is "type" a required keyword? If not, how should validation be handled when a type is not specified? #172
Comments
Type should not be inferred from other keywords. If What the spec is attempting to say is: some keywords (such as Therefore, if your schema is |
So if I'm not mistaken, the schema In my opinion the schema above looks rather unnatural. When you write Since usually one reads the comma as an "and", I think that a natural solution would be to state that schemas as |
That's correct. And yeah, the system you describe would also make sense, although I'm not sure whether it makes more intuitive sense to me. That's not how the specs have worked so far, and it would be a pretty big breaking change. |
I find implicit type validation intuitive only in the first case, when you have keywords that apply to the same type. In cases where you mix keywords for different types, I find it quite awkward to change the implicit Boolean "and" logic to "or". Currently the spec, although not very intuitive at first, clearly disambiguates any such case by simply ignoring keywords that do not apply to the actual data type and marking them as passing. This also means that a schema writer must be explicit about the required type and I find this a good thing in terms of security, the latter being one of the most common contexts a JSON validator is used in. |
Hey, in my opinion the most intuitive thing to do in this case, is to reject all instances of documents that don`t state a particular type. Furthermore, there should be instances of JSON Schemas that we could call well-formed schemas and there should be other kind of schemas that we could name not well-formed schemas. On one hand, an instance of a not well-formed schemma should be something like this:
As you can see, this is kind of a syntax error, since the type is missing in the JSON Schemma. This kind of schemas should be handled as not valid JSON Schemas(from the side of the programmer). On the other hand, we have the well-formed schemmas which can be satisfiable or not. An example of a well-formed schemma that is not satisfiable could be something like this:
As you can see, the syntax of the schemma doesn't have any problems, but there is no json document that could satisfy it, since there is no number bigger than 8 and smaller than 4 at the same time. This kind of Schemas should be handled as valid JSON Schemas, even if they are not satisfiable. In my opinion this is a much better way to see the problem. Even if you can work with the way that the specification states right now, I think that this could be seen as a good practice for the community. The reason of this, is because you give much more sense to the existence of the booleans operators such as anyOf, allOf, not, etc. In this context, you could use this operators to represent schemmas like |
@jessepinho the value should successfully validate, as it would for the empty schema
Its a good point raised by @jessepinho. I had assumed @jessepinho's "latter" interpretation without even considering this alternative, because it's just conceptually simpler. In the latter interpretation we have one schema object nesting level to one constraint. I don't think I'm the only one who interpreted the spec this way. In fact at the least the PHP JSON Schema implementation I'm currently using also read it this way. |
From @juanreutter:
I think this nails the point home the most that a As for @geraintluff's point that it would be a breaking change, I (sort of) agree that this is a concern. But two thoughts:
|
The spec does state the behaviour. Which keywords are tested or ignored is defined by the type of the data. The allowed types are specified by the {
"type": ["string", "number"],
"maxLength": 5, // if it's a string, length of at most five
"oneOf": [ // if it's a number, either between 0-10, or between 100-200
{"minimum": 0, "maximum": 10},
{"minimum": 100, "maximum", 200}
]
} The above example illustrates how this works - the options in To illustrate why I think it's a bigger change than the v3/v4 ones: while the changes from v3 to v4 were syntactically incompatible, whenever they agreed on syntax they also agreed on behaviour. This means it is straightforward to implement a validator that handles both v3/v4 without having to be told which version it should be using (e.g. by testing whether |
The spec indeed define the cases of no type and multiple types, but this seems to be rather unnecessary and confusing (hence this issue). The schema presented by @geraintluff is equivalent to writing
which is much more intuitive and easy to read. Moreover, notice that this is not equivalent to
since the latter schema does not validate strings. To me this is counterintuitive. @geraintluff's comment My conclusion is that allowing for multiple or empty types:
This facts make me lean towards making the About this being a breaking change I agree with @jessepinho's thoughts, and would like to add the fact that it would be a breaking change only in terms of the specification. I don't think (I could be wrong though) that people use schemas like |
+1 @mugartec |
+1 @mugartec I can see what @geraintluff is trying to achieve by allowing a lot of freedom in the way people can write schemas (after all JSON is supposed to be very intuitive), by allowing to factor out many parts and have high-level descriptions. The problem with this is that having a formal specification is quite difficult when you want to have this level of generality, which: Overall I believe it would be beneficial for the community to have a very clean standard for the schema and, seeing what was proposed above, no expressive power would be lost. I agree that with this people might have to be a bit more pedantic when writing schemas, but it seems that this would be for the best in the long run as it would not be that prone to introducing unwanted effects. |
Actually the meaning of the comma is consistently AND. The changes proposed amount to either requiring the "Some validation keywords only apply to one or more primitive types. When the primitive type of the instance cannot be validated by a given keyword, validation for this keyword and instance SHOULD succeed." to "Some validation keywords only apply to one or more primitive types. When the primitive type of the instance being validated is not applicable to a validation keyword, validation for this keyword and instance SHOULD succeed. For example validating the string value "foo" against the "minimum" keyword SHOULD succeed because "minimum" is only applicable to numeric types." |
Consider the case where you want to define a schema where any valid JSON is valid. The If {} If { "type": ["array", "boolean", "integer", "number", "null", "object", "string"] } If {
"anyOf": [
{ "type": "array" },
{ "type": "boolean" },
{ "type": "integer" },
{ "type": "number" },
{ "type": "null" },
{ "type": "object" },
{ "type": "string" }
]
} In the cases where It is nice that we can start with an empty schema that validates everything and then add keywords to add constraints. Any keyword that is not in the schema has no effect on validation. No keyword is exempt. Not even I can not agree that the documentation is ambiguous, but it is undeniable that it leaves many people confused. I think it would be a good idea for the documentation to acknowledge the common misconception that leads to threads like this and add further explanation to avoid more people getting confused. That is the only change proposed in this thread that I would support. |
👍 😄. How would we go about making such a change to the spec. Just PR it to the |
@jdesrosiers I understand the main idea behind the current definition more clearly now, but in both cases you could start with an empty schema and change the way you state the constraints (i.e. here you could impose that type is required). However, this would require quite some rewriting of the syntax. Now, about your explanation that "It is nice that we can start with an empty schema that validates everything and then add keywords to add constraints.". I do not think that this applies to the examples above as they say: if the constraint applies to the type apply it, if not accept. |
I cannot imagine a situation in which someone would like to do something like this, in this case you just avoid validating. This is not a real world example, any instance of such a schema would probably be a mistake. Now, in the hypothetical case you wanted to write such a schema I still think that the explicit schemas are a better option. Moreover, there is no need to write such a verbose schema with every type:
will do the trick. This schema explicitly says a document that is either a string or not a string, so it is intuitive and readable. On the other hand the schema My argument is that there should be a direct correspondence between what the schema syntactically says and the restrictions it imposes. Experience says that this property is fundamental to the massive adoption of any standard or convention since people usually won't read the specification. In the case of JSON Schema, this correspondence occurs precisely when the keyword As to @jdesrosiers comment
I cannot agree with this: If you already have a validator for v4, just adding the condition of requiring As I said before, making "type" a required keyword introduces a breaking change only if people is using schemas with no type or with multiple types. It would be nice if someone exhibits a real world example in which using multiple or no types comes in handy and more natural than using combinations of schemas. Differences aside, I think we are on completely different pages. On one side, it is true that the specification is not ambiguous, that this is the standard and it would be a breaking change. On the other side, making this change would make schemas simpler and easier to read without restricting the expressive power. I agree on both arguments, just think the latter is better for the adoption of JSON Schema. Finally, I'd like to mention that implementing this change allows for a passive intermediate step. In the next version schemas with multiple or no types could be allowed, but every time such a schema is used a warning is sent (common practice) saying that this will not be allowed in the next version. Hence, the user changes his schema and in the next version validation continues working the same way. |
@mugartec and @jdesrosiers I still do not see why you would not use |
@sam-at-github, there is an open pull request regarding this at json-schema-org/json-schema-spec#4. I'm not a fan of the proposed new wording. I think we can do better. In particular, I think it will be most helpful to explicitly call out the misconception. |
@DomagojVrgoc It's not that we wouldn't use the empty schema, it's that there is no reason to include it in the definition, it doesn't allow to say more and could induce to mistakes. If you only ask why not to include it then your specification will end up messy and full of useless stuff. I repeat that it would be nice if someone exhibits a real world example in which using multiple or empty types comes in handy and more natural/readable than using combinations of schemas. |
I think this kind of thing trips people up with JSON Schema a lot. Assumptions are made based on what you are familiar with. A common thing I see is that when people don't see familiar OOP terms like extends, inherit, and abstract, they assume that schemas aren't extensible or reusable. JSON Schema can be confusing at first because it takes a unique approach.
This is a proposal I think is worth discussion. It doesn't suffer from the empty schema problem and it forces people to write more readable schemas. However, my instinct is to keep the specification simple and flexible. There may be ways JSON Schema could be used that none of us have thought of. |
I gave a real world example when I presented the problem. Is the meta-schema example not good enough for you?
Your example is clever, but it is just another example of having to jump through hoops to get a no-op. I'm not convinced. I think it is always a good idea to have a concept of zero. Just because you can't think of a use for it, doesn't mean that someone out there won't think of one.
Of course there is a direct correspondence between what the schema says and the restrictions it imposes. I'm not following this argument.
I was not speaking against the proposal with these words. I was speaking generally about the approach JSON Schema takes. You are right that making
The meta-schema uses a schema with no type. It would be a pretty significant breaking change if it breaks the meta-schema.
Making this change doesn't make schemas simpler and easier to read, it just makes it harder to write schemas that are hard to read. That is certainly a good thing, but my instinct is to leave the specification simple and flexible for someone who has thought of some useful way to use JSON Schema that we haven't considered. Abuse of this feature isn't a real problem, so making this restriction could only really serve to potentially stifle someone's creativity. |
I thought it would be instructive to see a couple examples where leaving EnumsWhen using Consider this enum describing a falsey value. { "enum": [0, false, ""] } ... becomes ... {
"anyOf": [
{ "type": "number", "enum": [0] },
{ "type": "boolean", "enum": [false] },
{ "type": "string", "enum": [""] },
]
} It's pretty clear which one is easier the read and work with. NullableI have seen people have requirements that a property must be present, but can be null. As silly as this seems, this was a real world example I have come across. {
"type": "object",
"properties": {
"foo": {
"type": ["string", "null"],
"minLength": 2
}
},
"required": ["foo"]
} ... becomes ... {
"type": "object",
"properties": {
"foo": {
"anyOf": [
{
"type": "string",
"minLength": 2
},
{ "type": "null" }
]
}
},
"required": ["foo"]
} I know which schema I would rather work with. Additional PropertiesCheck out this example from Stack Overflow where it was necessary to declare properties without any constraints in order to disallow additional properties. If Type duplicationWhen using {
"type": "integer",
"minimum": 2,
"anyOf": [
{ "maximum": 5 },
{ "multipleOf": 5 }
]
} ... becomes ... {
"type": "integer",
"minimum": 2,
"anyOf": [
{ "type": "integer", "maximum": 5 },
{ "type": "integer", "multipleOf": 5 }
]
} Not only does this addition not make the schema more readable, it makes it error prone because now the type type is declared three times and they must all be the same. |
@jdesrosiers Thanks for the new examples, they are actually convincing. I think the nullable one is a bit strange though, I'd rather make Anyways, having now more questions than answers I just wanted to clarify the next point, as I think is the important part:
By this I mean a very practical thing: Consider the schema This situation can be avoided by simply rewriting the schema above as
I know it is more verbose, but it is more readable and much less susceptible to misinterpretations. |
I completely agree. The guy I was helping insisted it needed to be that way. I'm assuming some legacy issue he had to work around. I wouldn't ever recommend this pattern in a greenfield project. The additional properties example is the same way. I wouldn't recommend doing things that way, but sometimes we don't have a choice and JSON Schema needs to be able to support us in these times as well.
Agreed. Not the way to go. I agree that your explicit schema is better than the simple one in the case you present. I think we are completely on the same page about not wanting to see people writing schemas like that. I am just not seeing a good way to enforce this kind of thing in the specification without loosing anything. I think making |
@jessepinho discussion here seems to have slowed to a halt. Could you please close this? There is no one remaining with the project who is able to do so- only the original person who opened the issue. You can open a new issue at json-schema-org/json-schema-spec if you want to discuss this further. I'll also throw in another example. The only way to specify an odd integer is: {
"type": "integer",
"not": {"multipleOf": 2}
} If "multipleOf" implied a type of "integer", then this would be an impossible schema. There are actually a bunch of these. Some of them can be rearranged if you are writing them inline, but if you are $ref-ing a shared definition, that is not always possible. |
To keep it in the same context here. Therefore, odd integer schema can also be: {
"allOf": [
{"type": "integer"},
{"not": {"type": "integer", "multipleOf": 2}}
]
} PS; I've read the paper Foundation of JSON Schema which @fsuarezb in the thread, is a co-author. As I'm implementing static analysis for json schema, and almost give up now, the only way out for me is to put restriction on how schema written while still comply with the spec. Another interesting article is https://www.genivia.com/sjot.html, one of the json schema downside stated in the article is
It seems to be true if I understand the paper mentioned above correctly (full details is here https://repositorio.uc.cl/bitstream/handle/11534/16908/000676530.pdf) However, only requiring Finally, this is just information that I think worth mentioning and might help the spec to be more formal. This is not a direct proposal. Maybe JSON Schema purpose is only for runtime with the presence of document or instance. |
According to json-schema.org:
However, it is unclear whether the "primitive type" refers to the instance itself, or to the schema's
type
keyword. If the latter, how should an instance be validated if the schema is missing atype
keyword for the instance? For example, if an instance is validated against a schema containing simply{ required: ["a", "b"] }
, should it be inferred that the schema describes an object, and thus validation should fail if the instance is, say, an integer? If so, what about if the schema also has non-object keywords, likeminimum
?IMHO, a schema missing a
type
keyword should be considered invalid, as that makes validation of an instance impossible. And as a result, any instance validated against it should succeed.The text was updated successfully, but these errors were encountered: