Would someone please think of the client developers?!?

It seems that most APIs — particularly internal ones — are not designed for ease of use but rather to be easy to implement. No one would expect a human facing product designed that way to be successful. We should not expect APIs to be any different.

Web APIs are products in their own right. That means all those rules for building great products, like understanding your users and their use cases, apply. APIs are not just high latency, bandwidth hogging database connections. Rather an API should expose an application and the business value it provides. This means understanding what clients want to accomplish and then affording those uses in easy, intuitive ways.

Communication with the users is the key to designing a great API. As with other types of products, it is often necessary to build the first version of an API before there are any developers using it. We are on shaky ground until our design is validated actual clients. As soon as there are actual, or even potential, client developers listening to, and integrating their feed back should be priority number one.

Listening doesn’t mean reflexively implementing every whim of users — users are not always right about the details — but by understanding what they are trying to accomplish we as API designers can build systems that afford those goals with a minimum of effort on the part of client developers. Facilitating that value creation should be our main goal as API designers.

Embedding

Designing the messages (or representations, i’ll use the terms interchangeably) is the most important part of API design. If you get the messages right everything else will flow naturally from them. Of course, there are trade offs that must be made when designing messages. One of those trade offs is how much data to put in each message. If they are too small clients must make too many calls to be performant. If they are too big generating, transferring and parsing the messages will be excessively slow.

Any entity worth discussing should have a URI of it’s very own. That is, it should be a resource. This means that we often (read: almost always) end up with a lot of resources that don’t really have much data directly. The usual pattern is that they have a small number of properties and then link to a bunch of other resources. For example consider an invoice: a few properties like purchase date, etc and then links to the customer, billing address, shipping address, and line items. The line items would, in turn, link to a product.

We often bulk up the representations of these lightweight resources by directly embedding representations of the other resources to which they link. This tends to reduce the number of requests needed because those embedded representations don’t need to be requested explicitly. This approach has substantial downsides, at least if implemented naively. Consider the following representation of an invoice with embedded representations.

 
{"purchase_date" : "2012-10-29T4:00Z",
 "customer"      :     
   {"uri" : "http://example.com/custs/42",
    "name": "Peter Williams",
    //...
   },
 "billing_address" :     
   {"uri"   : "http://example.com/addrs/24",
    "line1" : "123 Main St",
    //...
   },
 // etc, etc
 "line_items" :
   [{"uri"     : "http://example.com/li/84",
     "quantity": 3,
     "product" :         
       {"uri" : "...",
        "name": "Blue widget",
        "desc": "..."
       },
    },
    // other line items here
   ] 
} 

This approach is very appealing. All the data needed to display or operate on a invoice is right there at our fingertips which nicely manages the number of requests that need to be made. The data is also arranged in a logical way that makes sense to our human brains.

For all of its upsides, the downsides to this approach are substantial. The biggest issue, to my mind, is that it limits our ability to evolve this message over time. By directly embedding the line item and product data, for example, we are signalling that they are fundamentally part of this representation. Clients will implement code assuming those embedded resources are always there. That means we can never remove them without breaking clients.

There are many reasons we might want to remove those embedded representations. We might start seeing invoices with a lot of line items thereby resulting in excessively large messages. We might add a lot of properties to products and make the messages too large that way. We might move products to a different database and find that looking the all up takes too long. These are just a few of the innumerable reasons that we might want change our minds about embedding.

How small is too small?

Given that removing a property from a representation is a breaking change are there ways to design representations that reduce the possibility that we will need to remove properties in the future? The only real way is to make representations as small as possible. We will never need to remove a property that was never added in the first place. We already discussed how messages that are too small can result in excessive numbers of requests but is that really true?

Applying the yagni principle is in order when thinking about embedding. Embedding is easy to do and very super extremely hard to undo. It should be avoided until it is absolutely necessary. We will know it is absolutely necessary when, and only when, we have empirical evidence showing that now is the time. This will happen quite rarely in practice. Even when we have empirical evidence that our request volume is too high, solutions other than embedding are usually a better choice. Caching, in particular, can ameliorate most of the load problems we are likely to encounter. The fastest way to get a representation is not to embed it into another message that is passed over the wire but to fetch it out of a local cache and avoid the network altogether.

Embedding one representation inside another is an optimization. Be sure it is not premature before proceeding.

sometimes – not often, but sometimes – i like the idea of embedding

Annoyingly, sometimes optimizations really are required. In those situations where we have clear empirical evidence that the current approach produces too many requests, we have already implemented caching and we cannot think of another way to solve the problem embedding can be useful. Even in these situations embed should not done hierarchically as in the example above. Rather we should sequester the embedded representations off to the side so that it is clear to clients that they are an optimization. If we can signal that clients should not assume they well always be embedded all the better.

The following is an example of how this might be accomplished using our previous example.

 
{"purchase_date"       : "2012-10-29T4:00Z",
 "customer_uri"        : "http://example.com/custs/42",
 "billing_address_uri" : "http://example.com/addrs/24",
 "shipping_address_uri": "http://example.com/addrs/24",
 "line_item_uris"      :
   ["http://example.com/li/84",
    "http://example.com/li/85"],
 "embedded":
   [{"uri" : "http://example.com/custs/42",
     "name": "Peter Williams",
     //...
    },
    {"uri"   : "http://example.com/addrs/24",
     "line1" : "123 Main St",
     //...
    },
    {"uri"     : "http://example.com/li/84",
     "quantity": 3,
     "product_uri" : "http://example.com/prods/12"
    },
    {"uri" : "http://example.com/prods/12",
     "name": "Blue widget",
     "desc": "..."
    },
    // and so on and so forth
   ] 
} 

The _uri and _uris properties are links. A client looks for the relationship it needs and then first looks for a representation in the embedded section with the required uri. If it finds one then a network communication has been avoided, if not it can make a request to get the needed data. This approach clearly identifies representations that are embedded as an optimization and makes it easy for clients to avoid relying on that optimization to behave correctly.

This flat embedding is the approach taken by both HAL and Collection+JSON (albeit with some slightly different nuances). I suspect that the developers of both of those formats have experienced first hand the pains of having representations getting too big but not being able to easily reduce their size without breaking clients. If one of those formats work you use them, they have already solved a lot of these problems.

Other considerations

Avoiding hierarchical embedding also makes documenting your representations easier. With the sidecar style you can keep each representation to a bare mimimum size and only have to document one “profile” of representation for each flavor of resource you have. With this approach there is no difference between the representation of a customer when it is embedded vs when it is the root representation.

HTML is domain specific

The partisans of generic media types sometimes hold up HTML as an example of how much can be accomplished without domain specific media types. HTML doesn’t have application/business specific semantics and the whole human facing web uses it, so machine clients should be able to use a generic media type too. There is just one flaw with this logic. HTML is domain specific in the extreme. HTML provides strong semantics for defining document oriented user interfaces. There is nothing generic about HTML.

In the HTML ecosystem, the generic format is SGML. Nobody uses SGML out of the box because it is too generic. Instead, various SGML applications, such as HTML, are created with the appropriate domain semantics to be useful. HTML would not have been very successful if it had just defined links via the a element (which is all you need to have hypermedia semantics) and left it up to individual web sites to define what various other elements meant.

The programs we use on the WWW almost exclusively use the strongly domain specific semantics of HTML. Browsers, for example, render HTML based to the screen based on the specified semantics. We have web readers which adapt HTML — which is fundamentally visually oriented — for use by the visually impaired. We have search engines which analyze link patterns and human readable text to provide good indexing. We have super smart browsers which can often fill in forms for us. They can do these things because of the clear, domain specific semantics of HTML.

Programs don’t, generally, try to drive the human facing web to accomplish specific application/business goals because the business semantics are hidden in the prose, lists and labels. Anyone who has tried is familiar with the fragility of web scraping. These semantics, and therefore any capabilities based on them, are unavailable to machine clients of the HTML based web because the media type does not specify those semantics. Media types which target machine clients should bear this in mind.

Push or pull?

The question of how to communicate events often comes up when designing APIs and multi-component systems. The questions basically boils down to this: should events be pushed to interested parties as they occur, or should interested parties poll for new data?

The short answer: interested parties should poll for new data.

The longer answer is, of course, it depends.

Polling is the only approach that scales to internet levels. For the smaller scales of internal multi-component systems the answer is much less clear cut. It is clear that a push approach can be implemented in such environments using either web hooks or XMPP. Such approaches often appear to be simpler more efficient than the pull equivalents, and they are definitely lower latency.

The appearance of simplicity is an illusion, unfortunately. Event propagation using a push is only easy if are willing to give up a lot of reliability and predictability. It is easy to say “when an event occurs i will just POST it to registered URI(s)”. That would be easy, but the world is rarely that simple. What is the receiving server is down or unreachable? Are you going to retry, if so how many times? If not, is that level of message loss acceptable to all the interested parties. If the receiving system is very slow, will that cause a back-log in the sending system? If a lot of events happen in a very short period of time, can the receiving system handle the load?

The efficiency benefits of a push approach are real, but not nearly as significant as they first appear. HTTP’s conditional request mechanism provides, when used effectively, a way to reduce the cost of polling to quite low levels.

Pull is cool

APIs should be built around pulling data unless there is a particular functional concern that makes pull not work (e.g. low message latency being very important). Any push approach with have all the complexities if pull approach (to handle reliability issues) combined with a lot less predictable behavior because it’s performance will be dependent on one or more other systems ability to handle the event notification work load.