JSON Schema Definition Languages

We recently settled on using JSON as the preferred format for the REST-based distributed application on which I am working. We don’t need the expressiveness of XML and JSON is a lot cheaper to generate and parse, particularly in Ruby. Now we are busy defining dialects to encode the data we have, which is happy work. The only problem is there is not a widely accepted schema language for describing JSON documents.

I am not entirely sure a schema language for JSON is necessary in any strict sense. I think that validating documents against a schema is overrated. And, people do seem to be getting along just fine with examples and prose descriptions. Still, the formalist in me is screaming out for a concise way to describe the JSON documents we accept and emit.

I have a weakness for formal grammars. I often write ABNF grammars describing inputs and outputs of my programs, even in situations where most people would just use a couple of examples. I learned XML schema very early in it’s life and I had a love/hate relationship with it for years. Even though it is ugly and complicated I continued to use it because it let me write formal descriptions of the XML variants I created.1

There are a couple of relatively obscure schema languages for JSON. Unfortunately, I don’t find either of them satisfactory.

Cerny

The Cerny schema validator seem quite functional and while it is not intended as a JSON document schema language, it could be used as one.2 Unfortunately, CERNY.schema requires a complete Javascript interpreter and run-time to perform validation. This requirement stems from the fact that CERNY.schema allows parts of the schema to be defined as procedural Javascript code. This approach is completely unacceptable for a language independent data transfer language like JSON.

Such procedural checking is misguided, even beyond the practical problems with requiring a Javascript run-time. Procedural validation code is a powerful technique for validating documents. However, this procedural code greatly reduces the usefulness of schema documents as an informational tool. Schema languages should be designed to communicate the structure of documents to humans, and only incidentally to validator programs.

Kwalify

Another JSON schema language I have tried is Kwalify. Kwalify seems reasonably capable also but it has some warts that really bother me. My main issue with Kwalify is that it is super verbose. This is primarily due to the fact that Kwalify schema documents are written in YAML (or JSON). Schema definitions can be encoded in a generic hierarchical data language, but it is not a very good idea. I find both XSD and the XML variant of RelaxNG to be excessively noise and Kwalify shows a similarly poor signal to noise ratio. Schema language designers should look to RelaxNG’s compact syntax for inspiration and forget trying to encode the schema in the language being described.

Conclusion

I think JSON could benefit from a schema language that is completely declarative and has a compact and readable syntax. If anyone is working on such a thing I would love to know about it. I could roll my own but I would really rather not unless it is absolutely necessary.


  1. Later I learned of RelaxNG. Now am able to have my cake and eat it too. RelaxNG is a much simpler and elegant way to describe XML documents. And if you really need XML schema for some part of you tool chain you can mechanically convert the RelaxNG into XML schema.

  2. Updated to clarify that CERNY.schema is not intended as a JSON validator. I originally made the incorrect logical leap that it was. It is a easy step from “validator for JavaScript objects” to JSON validator because JSON documents are just serialized JavaScript object. However, the author of CERNY.schema informed me that JSON validation is not the intended use of CERNY.schema.

18 comments on “JSON Schema Definition Languages

  1. -

    Classifying my schema language a JSON schema language is simply wrong. I honestly do not now how you come to this conclusion, since it is never stated anywhere in the documentation. A Cerny schema is, as the first sentence in the page you link to explains, meant for “validating a JavaScript object against a schema” in order to simplify programs by reducing the amount of checks you have to make.

    Furthermore, i must disagree, that a schema is mainly targeted towards human consumption. If so, why not use prose?

  2. - Post author

    Mr Cerny, the page I linked to is indeed quite explicit about the
    schema being intended for validating JavaScript objects. However, it
    is a very small leap from that to being able to validate JSON
    documents since they are merely serialization of JavaScript objects.
    Based on your response here I now know that is not the intended use of
    Cerny. That being said I have a hard time imagining a situation in
    which I would need an object validation system except when the objects
    are coming from potentially unreliable source such as a serialized
    form.

    As for, why not just using prose is because I want to be able to
    describe documents more concisely and precisely than prose usually
    allows. However, even formal descriptions of formats usually need
    some prose attached to clarify certain aspect.

  3. -

    JSON is not a serialization format for JavaScript objects. The document that is created by toJSON does not carry any information about types nor does it carry any methods. Thank Crockford, otherwise it would be as useful for data interchange as the Java serialization format .-)

    Now if you create a JavaScript object from a JSON document on the client, one can augment the object with methods or pass it to some constructor. How do i know that the methods will work correctly? I can either clutter my code with checks or use a Cerny schema, which has the additional benefit that validation can be turned once the program runs correctly, thus resulting in performance improvements in production and in simpler (better maintainable) code. But during development it will be useful to have validation turned on, since in large projects which run over several months up to a year, data usually changes. And unfortunately it changes quicker than documentation. So to cut a long story short, i use it to define my media types for both, humans (developers) and machines. The requirement that a JavaScript interpreter is needed is acceptable is for me.

    I see that there is a need to update my schema guide with some use cases.

  4. -

    JSON Schema Definition Languages…

    Peter Williams: We recently settled on using JSON as the preferred format for the REST-based distributed application on which I am working. We don’t need the expressiveness of XML and JSON is a lot cheaper to generate and parse, particularly in R…

  5. -

    Count me in as someone interested in ‘JSON schema’. Check my blog.

  6. - Post author

    I did not mean that JSON was a fully capable serialization mechanism
    for arbitrary Javascript objects. But it does, if you care to see it
    this way, a encode a set of literal Javascript objects. I usually do
    not think of it that way because I am primarily working in Ruby these
    days. However, for my schema language search I decided to expand my
    world view a bit to see what would turn up.

    After I wrote the previous comment I spent a bit more time in the
    CERNY.* pages. It had not occurred to me that you could think of
    design-by-contract checks a sort of schema validation. It is a neat
    approach. And in that context, procedural validation is probably
    required, at least some of the time.

  7. -

    If “[…]the formalist in [you] is screaming out for a concise way to describe the JSON documents [you] accept and emit” who don’t you either
    a) Shut him up and keep using JSON. This is often a good way to deal with formalists.
    or
    b) Go XML – which does exactly what your inner formalist is screaming for

    I completely fail to see the point of encumbering JSON with XML-ish baggage.

  8. -

    Have you looked at JSPON? JSPON does include specification for object structure definition, i.e. schemas. If JSPON can be augmented to better suit your needs I would be glad to discuss evolving JSPON for better validation. I would really like to see further interopability in this area, so let me know if JSPON would be helpful, or if there is anything I could do make JSPON useful for what you are doing.
    kriszyp@xucia.com

  9. -

    @Peter you say: “JSON is not a serialization format for JavaScript objects”, if not, then what is it in a practical sense?

    Regarding the percieved baggage of XML schemas, i just have to say that I have no problems with XML schemas, they work just fine for the most part and XSD is better then XDR which is better than DTDs so we are at least moving forward. Regarding XML schema validation, I have typically had to shut it off due to the performance costs.

    What I typically use XML schemas for is to help generate valid XML inputs to non-REST web services. Also, they can be used to auto-generate UIs. They are also useful for generating “Beans” (if the schema is good) when using Java/C# or similar.

    I would love to see JSON schemas for the same reasons. I could auto-generate things like registration forms, we could finally have a standard JSON format for things like RSS, Atom, OPML, etc.

    JSON is the best transport for client apps written in javascript, bringing some order to the party would not hurt if it is done right.

    this blog entry seams to be off to a good start, but it would really need to be enhanced to be of any real use: http://www.epiphantastic.com/?p=22

    standards and structure are good, so long as it is not for their own sake.

  10. - Post author

    Brian, I actually do think that JSON is a serialization format for JavaScript objects. It was Robert Cerny who claimed it was not. In his defense, though it would not be feasible to use JSON, as it is defined today, as general purpose serialization format.

    Your JSON schema is interesting. However it feels a lot like Kwalify. Any reason you did not just implement support for Kwalify schemas?

  11. -

    So is there any news on a JSON schema similar to compact Relax NG? or did you end up writing one?

  12. -

    Hatem Nassrat, I have no news… I have ended up just using Kwalify. It works for the sorts of validations I want to perform (ie, automated acceptance tests). And when combined with an example or two it makes workable documentation of the formats we use.

    I am not especially happy with the results but it does work.

  13. -

    I have been searching the web for such a schema language and have managed to find one described at http://giftfile.org/depot/home/acarrico/json/json-rng.txt.

    As the link describes, it is an attempt for a json relax ng schema, and a quite goo one in fact. It seems to be in early development stages, and by that I mean they may not be validators written to parse this schema yet.

    I will be pursing this language as it seems promising; However, Kwalify, may be my only other choice., but a JSON schema in JSON does not seem so wise. Already this new schema allows for more than Kwalify.

  14. -

    JSON schema languages are unnecessary. That you are not entirely sure it’s necessary, in any strict sense, means you are on the right track.

    It is equally obvious that having a standard means of specifying the structure and type of JSON objects has the potential to add great value. Given that, you might be on the wrong track when you say “document validation is overrated”. That sort of generalization really doesn’t provide any insight without some sort of context.

    Now, as far as the stipulation that “schema languages should be designed to communicate the structure of documents to humans, and only incidentally to validator programs”, I’m putting that in the “wrong track” column. Maybe I’m missing something, but it doesn’t make a lick of sense. You can use C#, Java, Ruby or whatever to define your JSON structures, and check them with the compiler.

    A JSON schema should primarily exist to allow applications to validate JSON instances. Tools could generate schemas from objects or vice versa. Of course it should be human readable, that’s a given.

  15. -

    Here’s a good example that meets Peter’s primary objective for a schema.

    class Foo
    {
    int? bar; // Obviously nullable.
    int[] baz; // I’m not convinced it’s good if this can be null in the strictest sense. Should have a minimum two elements, maximum 8
    string qux; // This probably ought to be a proper name with no funny characters
    }

  16. -

    Here’s a good example that meets Peter’s primary objective for a schema.

    class Foo
    {
    int? bar; // Obviously nullable.
    int[] baz; // I’m not convinced it’s good if this can be null in the strictest sense. Should have a minimum two elements, maximum 8
    string qux; // This probably ought to be a proper name with no funny characters
    }

  17. -

    The itemscript project includes a JSON schema definition language and a validator, also a JSON application markup (itemscript JAM).

    itemscript also provides a reference client implementation called Item Lens and a reference server called Item Store.

Comments are closed.