HTML is domain specific

The partisans of generic media types sometimes hold up HTML as an example of how much can be accomplished without domain specific media types. HTML doesn’t have application/business specific semantics and the whole human facing web uses it, so machine clients should be able to use a generic media type too. There is just one flaw with this logic. HTML is domain specific in the extreme. HTML provides strong semantics for defining document oriented user interfaces. There is nothing generic about HTML.

In the HTML ecosystem, the generic format is SGML. Nobody uses SGML out of the box because it is too generic. Instead, various SGML applications, such as HTML, are created with the appropriate domain semantics to be useful. HTML would not have been very successful if it had just defined links via the a element (which is all you need to have hypermedia semantics) and left it up to individual web sites to define what various other elements meant.

The programs we use on the WWW almost exclusively use the strongly domain specific semantics of HTML. Browsers, for example, render HTML based to the screen based on the specified semantics. We have web readers which adapt HTML — which is fundamentally visually oriented — for use by the visually impaired. We have search engines which analyze link patterns and human readable text to provide good indexing. We have super smart browsers which can often fill in forms for us. They can do these things because of the clear, domain specific semantics of HTML.

Programs don’t, generally, try to drive the human facing web to accomplish specific application/business goals because the business semantics are hidden in the prose, lists and labels. Anyone who has tried is familiar with the fragility of web scraping. These semantics, and therefore any capabilities based on them, are unavailable to machine clients of the HTML based web because the media type does not specify those semantics. Media types which target machine clients should bear this in mind.

Comments 4

  1. Mike Kelly wrote:

    When people use the term “domain specific” when talking about media types, they mean application domains such as ‘banking’, ’email’, ‘project management’, etc.

    HTML is not specific to any of these domains and yet you can use it to build applications in them.

    Yes you can call “representing renderable media and links graphically within a window” a domain, and therefore claim that HTML is specific in that sense but I don’t think you actually gain anything by doing that. I definitely would not agree that “HTML is domain specific in the extreme”.

    Perhaps we just need better terminology or better clarification of what we are already using? iirc mamund’s book actually covers this?

    Posted 26 Sep 2012 at 9:16 am
  2. Peter Williams wrote:

    By this logic you could call some hypothetical banking media type generic because it could to used to implement a terrorist tracking application, or part of an ecommerce app, or a macro economics modeling app, etc. The measure of the genericity of a media type is not what applications can be implemented with it, but the semantics it provides.

    Better terminology might help, but I suspect our disagreement is not just one of terminology.

    Posted 26 Sep 2012 at 5:25 pm
  3. Mike Kelly wrote:

    the breadth of applications that can be (sensibly) implemented with a media type is a very good indication of its genericity.

    All of the semantics in HTML are generic. Hence why an HTML document is empty until some specific domain is expressed with it. It establishes a generic interface with which we can drive interactions across many different problem domains.

    that’s what most people mean when they say HTML is generic.

    Posted 27 Sep 2012 at 5:36 am
  4. Peter Williams wrote:

    Mike Kelly wrote:
    > the breadth of applications that can be (sensibly) implemented
    > with a media type is a very good indication of its genericity.

    I agree. HTML allows basically any application whose primary function is to display documents for human consumption. A hypothetical banking media type would allow basically any application whose primary function is about the flow of money between people and organizations. The sets of applications certainly overlap, but i am not at all sure which is larger. Of the applications implemented today the display stuff to humans set is probably larger, but as we get better and better at automation will humans remain intergal to so many applications? I sure hope not.

    Posted 28 Sep 2012 at 9:35 am