Services Questions

I recently had a colleague ask me several questions about service oriented architectures and breaking monoliths apart. This is an area in which i have a good deal of experience so i decided to publish my answers here.

What is a “service”?

A “service” is a discrete bit of functionality exposed via well defined interface (usually a standardized format over a standardized network protocol) that can be utilized by clients that are unknown and/or unanticipated at the time of the service’s implementation. Due to the well defined interface, clients of a service do not need to understand how the service is implemented. This style of software architecture exists to overcome the diseconomies of scale suffered by software software

How has the services landscape changed in the last 5-10 years?

In the mid-2000s it became clear that WS-*, the dominate service technology at the time, was dramatically over complicated and led to expensive to maintain systems. WS-*’s protocol independence combined with the RPC style adopted by most practitioners meant that clients generally ended up being very tightly coupled with a particular service implementation.

As WS-*’s deficiencies became clear, REST style architectures gained popularity. They reduce complexity and coupling by utilizing an application protocol such as HTTP, and often using simpler message formats such as JSON. The application protocol provides a uniform interface for all services which reduces the coupling between client and service.

Microservices are a relatively recent variant of service oriented architectures. As the name suggest the main thrust of microservices is the size of the components that implement them. The services themselves could be message queue bases or REST style APIs. The rise of devops, automation around deploy and operations, raises the practicality of deploy a large number of very small components.

Message queue based architectures are experiencing a bit of a resurgence in recent years. Similar architectures where popular in the early 2000’s but where largely abandoned in favor of WS-*. Queue based architectures often provide throughput and latency advantages over REST style architecture at the expense of visibility and testability.

What do modern production services architectures look like?

It depends on the application’s goals. Systems that need high throughput, low latency and extreme scalability tend to be message queue based event driven architectures. Systems that are looking for ease of integration with external clients (such as those developed by third parties) tend to be resource oriented REST APIs with fixed URL structures with limited runtime discoverability. Systems that are seeking long term maintainability and evolvability tend to be hypermedia oriented REST APIs. Each style has certain strengths and weaknesses. It is important to pick the right one for the application.

How granular is a service?

I would distinguish a service from a component. Services should be small, encapsulating a single, discrete bit of functionality. If a client wants to perform an action, that action is an excellent candidate for being a service. A component, on the other hand, is a pile of code that implements one or more services. A component should be small enough that you could imagine re-writing it from scratch in a different technology. However, there is a certain fixed overhead for each component. Finding the balance between component size and the number of components is an important part of designing service architectures.

What is the process to start breaking down a large existing application?

Generally organizations start by extracting authentication and authorization. It is an area that is fairly well standardize (oauth, SAML, etc) and also necessary to support the development of other services. Once authentication and authorization are extracted from the monolith, another bit of functionality is chosen for implementation outside of the monolith. This process is repeated until the monolith is modularized enough to meet the organizational goals. The features implemented outside of the monolith are often new functionality but to really break apart a monolith you eventually have to target some of the existing features.

Starting small and preceding incrementally are the keys to success. Small successes make it easier to build consensus around the approach. Once consensus is reached existing, customer facing features can be extracted more readily.

What are the organization / team structure impacts?

It is generally superior to construct a (rough, high level) design of the services needed and to form vertically integrated teams to implement business features across the set of components. Empowering all teams to create new components and, where appropriate, new services, increases the likelihood of success and shortens the time to implement a service oriented architecture.

Alternatively, component structures can follow the organizational structure (Conway’s law), resulting in one component per group in the organization. This approach can be practical in organizations where vertically integrated teams are politically unacceptable.

What are needed / helpful tools? To run services? To manage services?

  • Service discovery. Without it an inordinate amount of configuration is required for each component to interact with the ecosystem of services.
  • Instrumentation and monitoring. Without this it is impossible to detect and remediate issues that arise from the integration of an ecosystem of services.

How do companies document their services, interfaces and versions?

There are not any popular tools for creating documentation for hypermedia and message queue based APIs. However, general guidance can be found for creating documentation, but in general you are own your own.

For more resource oriented APIs tools like swagger and api blueprint can be helpful.

How can we speed developer ramp up in a service architecture?

To speed developer ramp up it is important to have well maintained scripts to build a sandbox environment including all services. Preferably a single script that can deploy components to virtual machines or docker containers on the developers machine. Additionally it is important to maintain searchable documentation about the API so that developers can find information about existing services.

How to deploy new versions of the ecosystem and its components?

Envisioning an ecosystem of services as a single deployable unit is contrary to the goals of service oriented architectures. Rather each component should be versioned and deployed independently. This allows the components to change, organically, as needed. This approach increases costs on the marketing, product management and configuration management fronts. The benefits gained on the development side by avoiding the diseconomies of scale are worth it.

How to manage API compatibility changes?

A service oriented architecture makes breaking API changes more damaging. It is often difficult to know all the clients using a particular API. The more successful an API the harder, and more time consuming, it is to find and fix all clients. This difficultly leads to the conclusion that all changes should be made in a backwards compatible way. When breaking changes are unavoidable (which is rare) server-driven content negotiation can enable components to fulfill requests for both the old API and the new one. Once all clients are transitioned to the new API the old API can be removed. Analytics in the old API can help identify clients in need of transitions and to determine when it is no longer needed.

How to version the ecosystem and its components?

This is a marketing and project management problem more than anything. The ecosystem will not have a single version number. However, when a certain set of, business meaningful, features have been completed it is convenient for marketing to declare that a new “version”. Such declarations are of little consequence on the development side so they should be done whenever desirable.

Announcing HalClient (for ruby)

HalClient is yet another ruby client library for HAL based web APIs. The goal is to provide an easy to use set of abstractions on top of HAL without completely hiding the HAL based API underneath. The areas of complication that HalClient seeks to simplify are

  • CURIE links
  • regular vs embedded links
  • templated links
  • working RFC6573 collections

Unlike many other ruby HAL libraries HalClient does not attempt to abstract HAL away in favor of domain objects. Domain objects are great but HalClient leaves that to the application code.

CURIEs

CURIEd links are often misunderstood by new users of HAL. Dealing with them is not hard but it requires care to do correctly. Failure to implement CURIE support correctly will result in future breakage as services make minor syntactic changes to how they encode links. HalClient’s approach is to treat CURIEs as a purely over-the-wire encoding choice. Looking up links in HalClient is always done using the full link relation. This insulates clients from future changes by the server to the namespaces in the HAL representations.

From the client perspective there is very little difference between embedded resources and remote links. The only difference is that dereferencing a remote link will take a lot longer. Servers are allowed to move links from the _links section to the _embedded section with impunity. Servers are also allow to put half of the targets of a particular rel in the _links section and the other half in the _embedded section. These choices are all semantically equivalent and therefor should not effect clients ability to function.

HalClient facilitates this by providing a single way to navigate links. The #related(rel) method provides a set of representations for all the resources linked via the specified relationship regardless of which section the links are specified. Clients don’t have to worry about the details or what idiosyncratic choices the server may be making today.

Templated links are a powerful feature of HAL but they can be a little challenging to work with in a uniform way. HalClient’s philosophy is that the template itself is rarely of interest. Therefore the #related method takes, as a second argument, as set of option with which to expand the template. The resulting full URL is used instantiate a new representation. This removes the burden of template management from the client and allows clients to treat templated links very similarly to normal links.

RFC6573 collections

Collections are a part of almost every application. HalClient provides built in support for collections implemented using the standard item, next, prev link relationships. The result is a Ruby Enumerable that can used just like your favor collections. The collection is lazily evaluated so it can be used even for very large collections.

Conclusion

If you are using HAL based web APIs I strongly encourage you to use a client library of some sort. The amount of resilience you will gain, and the amount of minutiae you will save yourself from will be well worth it. The Ruby community has a nice suite of HAL libraries whose level of abstraction ranges from ActiveRecord style ORM to very thin veneers over JSON parsing. HalClient tries to be somewhere in the middle. Exposing the API as designed while providing helpful commonly needed functionality to make the client application simpler and easier to implement and understand.

HalClient is under active development so expect to see even more functionality over time. Feedback and pull requests are, of course, greatly desired. We’d love to have your help and insight.

Embedding

Designing the messages (or representations, i’ll use the terms interchangeably) is the most important part of API design. If you get the messages right everything else will flow naturally from them. Of course, there are trade offs that must be made when designing messages. One of those trade offs is how much data to put in each message. If they are too small clients must make too many calls to be performant. If they are too big generating, transferring and parsing the messages will be excessively slow.

Any entity worth discussing should have a URI of it’s very own. That is, it should be a resource. This means that we often (read: almost always) end up with a lot of resources that don’t really have much data directly. The usual pattern is that they have a small number of properties and then link to a bunch of other resources. For example consider an invoice: a few properties like purchase date, etc and then links to the customer, billing address, shipping address, and line items. The line items would, in turn, link to a product.

We often bulk up the representations of these lightweight resources by directly embedding representations of the other resources to which they link. This tends to reduce the number of requests needed because those embedded representations don’t need to be requested explicitly. This approach has substantial downsides, at least if implemented naively. Consider the following representation of an invoice with embedded representations.

 
{"purchase_date" : "2012-10-29T4:00Z",
 "customer"      :     
   {"uri" : "http://example.com/custs/42",
    "name": "Peter Williams",
    //...
   },
 "billing_address" :     
   {"uri"   : "http://example.com/addrs/24",
    "line1" : "123 Main St",
    //...
   },
 // etc, etc
 "line_items" :
   [{"uri"     : "http://example.com/li/84",
     "quantity": 3,
     "product" :         
       {"uri" : "...",
        "name": "Blue widget",
        "desc": "..."
       },
    },
    // other line items here
   ] 
} 

This approach is very appealing. All the data needed to display or operate on a invoice is right there at our fingertips which nicely manages the number of requests that need to be made. The data is also arranged in a logical way that makes sense to our human brains.

For all of its upsides, the downsides to this approach are substantial. The biggest issue, to my mind, is that it limits our ability to evolve this message over time. By directly embedding the line item and product data, for example, we are signalling that they are fundamentally part of this representation. Clients will implement code assuming those embedded resources are always there. That means we can never remove them without breaking clients.

There are many reasons we might want to remove those embedded representations. We might start seeing invoices with a lot of line items thereby resulting in excessively large messages. We might add a lot of properties to products and make the messages too large that way. We might move products to a different database and find that looking the all up takes too long. These are just a few of the innumerable reasons that we might want change our minds about embedding.

How small is too small?

Given that removing a property from a representation is a breaking change are there ways to design representations that reduce the possibility that we will need to remove properties in the future? The only real way is to make representations as small as possible. We will never need to remove a property that was never added in the first place. We already discussed how messages that are too small can result in excessive numbers of requests but is that really true?

Applying the yagni principle is in order when thinking about embedding. Embedding is easy to do and very super extremely hard to undo. It should be avoided until it is absolutely necessary. We will know it is absolutely necessary when, and only when, we have empirical evidence showing that now is the time. This will happen quite rarely in practice. Even when we have empirical evidence that our request volume is too high, solutions other than embedding are usually a better choice. Caching, in particular, can ameliorate most of the load problems we are likely to encounter. The fastest way to get a representation is not to embed it into another message that is passed over the wire but to fetch it out of a local cache and avoid the network altogether.

Embedding one representation inside another is an optimization. Be sure it is not premature before proceeding.

sometimes – not often, but sometimes – i like the idea of embedding

Annoyingly, sometimes optimizations really are required. In those situations where we have clear empirical evidence that the current approach produces too many requests, we have already implemented caching and we cannot think of another way to solve the problem embedding can be useful. Even in these situations embed should not done hierarchically as in the example above. Rather we should sequester the embedded representations off to the side so that it is clear to clients that they are an optimization. If we can signal that clients should not assume they well always be embedded all the better.

The following is an example of how this might be accomplished using our previous example.

 
{"purchase_date"       : "2012-10-29T4:00Z",
 "customer_uri"        : "http://example.com/custs/42",
 "billing_address_uri" : "http://example.com/addrs/24",
 "shipping_address_uri": "http://example.com/addrs/24",
 "line_item_uris"      :
   ["http://example.com/li/84",
    "http://example.com/li/85"],
 "embedded":
   [{"uri" : "http://example.com/custs/42",
     "name": "Peter Williams",
     //...
    },
    {"uri"   : "http://example.com/addrs/24",
     "line1" : "123 Main St",
     //...
    },
    {"uri"     : "http://example.com/li/84",
     "quantity": 3,
     "product_uri" : "http://example.com/prods/12"
    },
    {"uri" : "http://example.com/prods/12",
     "name": "Blue widget",
     "desc": "..."
    },
    // and so on and so forth
   ] 
} 

The _uri and _uris properties are links. A client looks for the relationship it needs and then first looks for a representation in the embedded section with the required uri. If it finds one then a network communication has been avoided, if not it can make a request to get the needed data. This approach clearly identifies representations that are embedded as an optimization and makes it easy for clients to avoid relying on that optimization to behave correctly.

This flat embedding is the approach taken by both HAL and Collection+JSON (albeit with some slightly different nuances). I suspect that the developers of both of those formats have experienced first hand the pains of having representations getting too big but not being able to easily reduce their size without breaking clients. If one of those formats work you use them, they have already solved a lot of these problems.

Other considerations

Avoiding hierarchical embedding also makes documenting your representations easier. With the sidecar style you can keep each representation to a bare mimimum size and only have to document one “profile” of representation for each flavor of resource you have. With this approach there is no difference between the representation of a customer when it is embedded vs when it is the root representation.

HTML is domain specific

The partisans of generic media types sometimes hold up HTML as an example of how much can be accomplished without domain specific media types. HTML doesn’t have application/business specific semantics and the whole human facing web uses it, so machine clients should be able to use a generic media type too. There is just one flaw with this logic. HTML is domain specific in the extreme. HTML provides strong semantics for defining document oriented user interfaces. There is nothing generic about HTML.

In the HTML ecosystem, the generic format is SGML. Nobody uses SGML out of the box because it is too generic. Instead, various SGML applications, such as HTML, are created with the appropriate domain semantics to be useful. HTML would not have been very successful if it had just defined links via the a element (which is all you need to have hypermedia semantics) and left it up to individual web sites to define what various other elements meant.

The programs we use on the WWW almost exclusively use the strongly domain specific semantics of HTML. Browsers, for example, render HTML based to the screen based on the specified semantics. We have web readers which adapt HTML — which is fundamentally visually oriented — for use by the visually impaired. We have search engines which analyze link patterns and human readable text to provide good indexing. We have super smart browsers which can often fill in forms for us. They can do these things because of the clear, domain specific semantics of HTML.

Programs don’t, generally, try to drive the human facing web to accomplish specific application/business goals because the business semantics are hidden in the prose, lists and labels. Anyone who has tried is familiar with the fragility of web scraping. These semantics, and therefore any capabilities based on them, are unavailable to machine clients of the HTML based web because the media type does not specify those semantics. Media types which target machine clients should bear this in mind.

Media types and profiles

Opponents of API versioning using media types often suggest that media type proliferation is a cause for serious concern. The implication is that the more media types that exist, the more different formats intermediates and tools will need to understand in order to be useful. Fortunately, this is just not true. Having lots of media types does not imply having a lot of incompatible formats. Nor does it imply requiring tooling and intermediates to be a lot more complex.

This is difficult to come to terms with, in part, because most media types are not just one thing. The Atom feed for this blog is simultaneously an article syndication document, an XML document processable by any compliant XML parser, a UTF8 encoded plain text document and a octet stream. All of those are media types. It would be perfectly legal to return an Atom document but set the Content-Type to text/plain, but we generally choose to request and identify Atom feeds using the Atom media type because that provides the most value to the client making the request. Notice that we choose the most — not the least — specific media type.

A downside of using the most specific media type available is that some intermediates are put at a disadvantage. If some intermediate is able to do something useful with XML but does not understand that Atom is XML it might not do that useful thing with our request. On the other hand, using a less specific media type might have the same disadvantage. If we call our Atom document an octet stream intermediates are going to pretty much ignore it. We have a stack of formats each of which is a compatible extension of the all the ones below it, but we are only allowed to give it one name. This is bound to leave some components unable to work optimally.

Only being able to specify a single name is the root of the problem, not having lots of compatibly layered formats. Fortunately, the profile link relation provides a solution. You just have to use it in a slightly different way than its proponents currently suggest.

Very specific media types

The client constructing the requests needs to be able to tell the server what it needs to accomplish its goal. If it can work with any old octet stream it can put */* in the Accept header field. If, on the other hand, it is expecting specific information to be provided in an element with a specific id then it needs to be able to let the server know that. A very specific media type combined with content negotiation is great way to provide this while still allowing substantial flexibility to servers.

Use profile link header for more generic format information

Rather than prevent clients from asking for what they need, servers should decorate responses with profile link headers that provide hints about alternate ways a representation could be processed. This provides intermediates and tooling a way to identify representations they can work with, regardless of what the Content-Type header field says.

Consider an API that uses a media type based on Atom but with extensions. It could register a very specific media type in the vendor tree for that particular flavor of atom and use that in Content-Type header field. In addition, it could provide link headers pointed to http://tools.ietf.org/html/rfc4287 (Atom), http://www.w3.org/TR/2006/REC-xml11-20060816/ (XML), and some URI representing plain text. Clients that need the specific extensions to work can make that known. Clients that can work with any old Atom document can request Atom documents and get them with or without the extensions. Intermediates that work with any Atom document can easily detect that very specific media typed responses are, in fact, Atom so they can do their job. And if at some later date a standard way to represent this data emerges the API can add support for it without breaking any of its existing clients, direct or implicit.

Background

This particular line of thinking was prompted by Peter Janes pointing out that profiles and media type based versioning might be complementary.

Something has been nagging at me about the approaches to REST API versioning presented by Peter Williams and Mark Nottingham. I’m sure they’re complementary, but I’m not quite grokking how.

That insight really got me thinking. The impacts to intermediates of media type versioning has been a nagging issue for me for a while now and i am happy to finally have a solution.

Bookmarks and URI based versioning

Threads about how to version hypermedia (or REST) APIs are multitude. I certainly have made my opinion known in the past. That being said, the most common approach being used in the wild is putting a version number in the URI of the resources which are part of the API. For example, http://api.example.com/v1/products/42.

That approach has the advantage of being simple and easy to understand. Its main downside is that it makes it difficult for existing clients to switch to a newer version of the if one becomes available. The difficultly arises because most existing clients will have bookmarked certain resources that are needed to accomplish their goals. Such bookmarks complicate the upgrade quite significantly. Clients who want to use an upgraded API must choose to rewrite those bookmarks based on some out of band knowledge, support both the old and new version of the API, or force the user to start over from scratch.

None of these are good options. The simplest, most attractive approach is the first. However, forcing clients to mangle saved URIs reduces the freedom of the server to evolve. The translation between the two versions of the API will have to be obvious and simple. That means you are going to have to preserve key parts of the URI into the new structure. You cannot switch from a numeric surrogate key to a slug to improve your SEO. Likewise, cannot move from a slug to a numeric surrogate key to prevent name collisions. You never know when the upgrade script will be executed. It could be years from now so you will also need to maintain those URIs forever. Some clients have probably bookmarked some resources that you do not think of as entry points, you will need to be this careful for every resource in your system.

The second option, forcing clients to support both versions of the API, is even worse that the first. This means that once a particular instance of a client has used the API it is permanently locked into that version of that API. This is horrible because it means that early users cannot take advantage of new functionality in the API. It is also means that deprecated versions of the API must be maintained much longer than would otherwise be necessary.

The third option, forcing users to start over from scratch, is what client writers must do if they want to use functionality which is not available in the obsolete version when there is no clear upgrade path between API versions. This is not much work for the client or server implementers but it seriously sucks for the users. Any configuration, and maybe even previous work, is lost and they are forced to recreate it.

A way forward

Given that this style of versioning is the most common we need a solution. The link header provides one possible solution. We can introduce a link to relate the old and new versions of logically equivalent resources. When introducing a breaking API change the server bumps the API version and changes the URIs in any way it likes, eg the new URI might be http://example.com/v2/products/super-widget. In the old version of the API a link header is added to responses to indicated the equivalent resource in the new API, eg http://example.com/v2/rels/v2-equivalent.

>>>
GET /v1/orders/42 HTTP/1.1
...

<<<
HTTP/1.1 200 OK
link: <http://example.com/v2/orders/super-widget>; rel="alternate http://example.com/v2/rels/v2-equivalent"
...

Older clients will happily ignore this addition and continue to work correctly. Newer clients will check every response involving a stored URI for the presences of such a link and will treat it as a redirect. That is, they will follow the link and use the most modern variant they support.

If you are really bad at API design you can stack these links. For example, the v1 variants might have links to both the v2 and v3 variants. Chaining might also work but it would require clients to, at least, be aware that any intermediate version upgrade link relations so that they could follow that chain to the version they prefer.

You could also add links to the obsolescent variant’s body. This would be almost equivalent except that it requires clients to be able to parse older responses enough to search for the presence of such a link. Using the HTTP link header field nicely removes that requirement by moving the link from the arbitrarily formatted body to the HTTP header which will be supported by all reasonable HTTP clients.

Using URIs to version APIs may not be the cleanest way to implement versioning but the power of hypermedia allows us to work around its most obvious deficiencies. This is good given the prevalence of that approach to versioning.

Vertical Slicing

I am a fan of polylithic architectures. Such architectures have many advantages related to enhancing evolvability and maintainability. When you decide to create a system composed of small pieces how do you decide what functionality goes into which component?

Principles

The goal is to sub-divide the application into multiple highly cohesive components which are weakly connascence with each other. To achieve the desired cohesion it will be necessary to align the component boundaries with natural fissure points in the application.

The strategy should allow for the production of a arbitrary number of components. A component that was of a manageable size yesterday could easily become too large tomorrow. In that situation the over-sized component will need to be sub-divided. Applying the same strategy repeated will result in a system that is more easily understood.

We want to minimize redundancy in the components. Redundancy results in more code with must be understood and maintained. More importantly redundancy usually introduces connascence of algorithm, making changes more error prone and expensive. In a perfect world, any particular behavior would be implemented in exactly one component.

We want to isolate changes to the system. When implementing a new feature it is desirable to change as few components as possible. Each additional component that must be changed raise the complexity of the change. The componentization strategy should minimize the number of components involved in the average change to the system.

With those metrics in mind lets explore the two most common approaches and see how they compare with each other. Those two patterns of componentization are horizontal slicing and vertical slicing.

Horizontal slicing

In this approach the component boundaries are derived from that implementation domain. The implementation is divided into a set of stacked layers in such a way that a layer initiates communication with the layers below it. This results in a standard layered architectures. By implementing each layer in a separate component you can achieve the horizontal slicing. This style of componentization strategy results in the very common n-tier architecture pattern.

For example, an application that has a business logic and a presentation layer the application would be divided into two components. A business logic component and a presentation component.

Vertical slicing

In this approach the component boundaries are derived from the application domain. Related domain concepts are grouped together into components. Individual components communicate with any other components as needed.

This approach is also quite common but is usually thought of a lot less formally. It is more common for this type of segmentation to develop incidentally. For example, because separate teams developed the parts independently, and then integrated them later. Any time you integrate separate applications you have vertical componentization.

The Score

Against the metrics we laid out earlier, vertical slicing does much better than horizontal.

Horizontal slicing Vertical slicing
Cohesion high high
Repeatability low high
DRYness low high
Change isolation low high

Cohesion

Horizontal slicing has high cohesion. Each of the components can represent the a logically cohesive part of the implementation.

Vertical slicing also has high cohesion. Each component represents highly cohesive part of the application domain.

Repeatability

Vertical slicing provides a mechanism for reapply the subdivision pattern an arbitrary number of times. If any component gets too large to manage it can be divided into multiple components based on the application domain concepts. This same process can be repeated from the initial division of a monolithic application until components of the desired size have been achieved.

Horizontal slicing is less repeatable. The more tiers the harder it is to maintain cohesiveness. In practice it is very rare to see an tiered architecture with more than 4 tiers, and 3 tiers is much more common.

DRYness

Horizontal slicing tends to result in some repetition. Certain behaviors will have to be repeated a each layer. For example, data validation rules. You will need those in the presentation layer to provide good error messages and in the business logic layer to prevent bad data being persisted.

Vertical slicing allows you to reduce the connascence of algorithm because any single user activity is implemented in exactly one component. Components usually do end up communicating to each other, however, they do so in a way that does not require in the same algorithms be implemented in multiple components. For any one bit of data or behavior, one component will its authoritative source.

Change isolation

Vertical scaling tends to allow new features to be implemented by changing only one component. The component changed is the one which already contains features cohesive with the new one.

Horizontal slicing, on the other hand, tends to require changes in every layer. The new feature will require additions to the presentation layer, the business logic layer and the persistence layer. Having to work in every layer increase the cognitive load required to achieve the desired result.

Conclusion

Vertical slicing provides significant advantages. The high cohesion, dryness, and change isolation combine to drastically reduces the risks and cost of change. That is turn allow better/faster maintenance and evolution of the system. The repeatability allows you to retain these benefits even while adding functionality over time. Each time a component gets too large you can divide it until you have reach a application size that is human scaled.

Having a large number of components operate as a system does result in a good deal of communication between the components. It important to pay attention to the design of the APIs. Poor API design can introduce excessive coupling which will eat up most of the advantages described above. Hypermedia – or more precisely, following the REST architectural style – is the best way i know to reduce coupling between the components.

What are links

When designing hypertext formats is it better to provide links for every available action or to provided links to related resources and let the client use the protocol interface to achieve particular actions on those related resources?

I have leaned in both directions at various times. I have never fully convinced myself either.

To make the issues a bit clearer let me use and example lifted from the article that got me thinking about this most recently.1

<cart>
  <!-- some stuff here -->
  <link rel="http://ex.org/rel/abort" 
        href="http://ex.org/cart/cancel;token=987654321"/>
  <link rel="http://ex.org/rel/add-more" 
        href="http://ex.org/cart/add;token=987654321"/>
  <link rel="http://ex.org/rel/buy" 
        href="http://ex.org/cart/buy;token=987654321"/>
</cart>

I place this example in the “links for every action” camp. Each of the links in the example describes exactly one action.

An alternate approach might look something like this.

<cart>
  <!-- some stuff here -->
  <link rel="http://ex.org/rel/line-items" 
        href="http://ex.org/cart/line-items;token=987654321"/>
  <link rel="http://ex.org/rel/new-order" 
        href="http://ex.org/orders?cart=987654321"/>
</cart>

From a client perceptive these are a bit different.

Abandoning cart
A client that wants to abandon a cart in the first example would make a DELETE or POST – it’s a bit hard to tell which from the example – request to the href of the http://ex.org/rel/abort link. In the second example a similar client would just DELETE the cart resource.
Adding item
When adding an item in the first example the client would post a www-form-urlencoded document containing the URI of the item to add and the quantity to the href of the http://ex.org/rel/add-more link. In the second example, the same document gets posted to href of the http://ex.org/rel/line-items link.
Placing order
In the first example the client would make a POST request to the href of the http://ex.org/rel/buy link. In the second example the client would POST a www-form-urlencoded document containing the cart URI and some payment information to the href of the http://ex.org/rel/order link.

Differences

Obviously the to approaches result in quite similar markup. The same behavior is encoded in both. In the first example the links are action oriented. All actions that can be taken on an item are explicitly stated using a link. In the second approach the links are data oriented rather than action oriented. Rather than having separate links to retrieve the current line items and to add a new line item the http://ex.org/rel/items link provide both actions using the GET and POST HTTP methods respectively.

The first approach it better at expressing what actions are allowable at any given point in time. For example, once the purchase process has been initiated it does not make sense to abort a cart. So if you GET a cart after POSTing to the http://ex.org/rel/buy link the representation would not have the http://ex.org/rel/abort link.

The second is more concise because it, at least potentially, provides access to more than one action per link based on the standard HTTP methods. You don’t need to provide a separate abort link because DELETEing the cart is sufficient. You don’t need to provide separate get line items and add line item links because a single link that can handle GET and POST requests will work.

The first approach is a bit more flexible with regard to implementation details. If you need for some reason to have different URIs for the retrieve line item request than the add line item request you could easily achieve it. The second example makes that impossible.

Conclusion

I am still not entirely convinced but i am leaning toward the more flexible, verbose and explicit approach of a link for every actions.2 Having links represent actions rather than resources feels a bit odd, but i think it provides more of the benefits we hope to get from a RESTful architecture.


  1. I am still not a fan of the link element. This example is a good one in every other regard.

  2. That counts as at least the third vacillation i have had on this topic. I was leaning the other direction before writing this.

Unobtrusive link info

Mr Amundsen’s recent post regarding the design of “semantic machine media types” got me thinking about media type design. One of the commonly encouraged practices, particularly on the REST discuss group, is the use of link elements.

I really dislike this idea. It sets my teeth on edge because it treats links – which are possibly the most important bits of data in existence – as second class citizens. It is easiest to show what i mean with a bit of extrapolation:

<complexElement rel="entry">
  <string rel="id">234132</string>
  <string rel="displayName">Peter Williams</string>
  <complexElement rel="name">
    <string rel="familyName">Williams</string>
    <string rel="givenName">Peter</string>
  </complexElement>
  <complexElement rel="emails">
    <link rel="email" href="mailto:pezra@barleyenough.org"/>
    <string rel="type">personal</string>
  </complexElement>
</complexElement>

That is what a portable contact might look like if we treated all data the way link elements work. That example looks pretty ugly to me, as i suspect it does to most people. It is ugly because very important information regarding the role of elements is relegated to a subsidiarity position in favor of fairly unimportant information about its type. However, link elements do to links exactly what my example does to all the data. The effect is that properties whose values happen to be independently addressable resources are obfuscated.

The revealed preference of the world is against link elements. Just look at pretty much any format that embeds application specific semantics. As far as i know, there is not a single widely used format that actually represents its links as link elements. Even atom uses properly named elements for most its links. The link element it defines is largely relegated to the back water of extensibility.

One benefit that link elements have, or at least could have if they where more widely used, is the facilitation of standard link processing tools. Fortunately, we do not have to give up the expressiveness and clarity of intention revealing names to achieve this result. Rather than obscuring the links we could just treat them as normal data. The additional information needed to support standard tools could be added in a relatively unobtrusive way. Consider the following:

<entry xmlns:link="http://unobtrusive-generic-linking.org/">
  ...
  <emails>
    <value link:hrefDisposition="elementContent" 
           link:rel="foo">
      mailto:pezra@barelyenough.org</value>
    <type>personal</type>
  <emails>
</entry>

This is idea is similar to xLink but more flexible and simpler to use.

You could expand the idea to JSON with relative ease. Consider the following expansion of the portable contacts json format tagged with some unobtrusive link info.

{"entry": [
  {"id": "42",
  "emails":
    [{"address"   : "mailto:pezra@barelyenough.org",
      "type"      : "personal",
      "_linkInfo" : {"hrefDisposition" : "address", 
                     "rel" : "foo"}}]}]}

Unobtrusive link info makes links visible to and usable by generic link processing tools while protecting the use of intention revealing names that format designers, and users, want. This is important because it allows new formats to reuse “standard” link semantics more easily and uniformly.

In defense of link storage

It seems that more and more are beginning to grasp the hypermedia constraint of REST. This is an unmitigated Good Thing. However, once you get hypermedia the idea of a client persisting links that it has found starts to seem a little odd. For example, Kirk Wylie describes clients that store links as “not well behaved” in his excellent presentation on REST in financial systems. Even on the rest-discuss mailing list there is no consensus on the matter.

The idea of an application as a set of states (read: representations) with transitions (read: links) to other states seems to go against the idea of storing links. Transitions from one application state to another are surely transient. Any change in the application state, either by this client or some completely unrelated client, could easily invalidate those transitions. In that context a client that stored links for later use would surely be doomed dereferencing dead links for the rest of it’s days.

Further, the idea that clients might store links is a frightening specter for maintainers of services. If clients store links, and you prefer not to break those clients, you must continue supporting any links you have ever included in any representation in perpetuity. Talk about limiting your design freedom. Such a strict requirement would surely raise the cost of maintaining the service over time.

Reality sets in

Those are scary thoughts. Some of these issues are even real. But end the end it doesn’t matter. Almost all non-trivial systems are going to require that URIs be stored in places other than the origin server. Sometimes these stored URIs will merely be caches. Other times they will be data that cannot be recalculated mechanically.

For example, say you have an order taking system and an inventory system. When placing an order the user goes to the web site, searches for “coffee”, selects the third item in the results and places an order 1 of that item. An order is a set of line items each of which references a product. Once payment is received the order system is going to need to be able to tell the shipping department which items from inventory to send to the customer.

The inventory system has, of course, a URI for every type of product that is for sale. So the simplest and most effective way for an order to reference a product is to use that the inventory URI for that product. URIs are called universal resource identifiers for a reason, we might as well use them as such.

In this example, we have a situation where the product references in the order are not merely caches of URIs. Many things may change the ordering of search results – a new product being added, an old one being discontinued, even a small change to a description of some product. So at any moment the the third item in the search results for “coffee” might be different. Once the user has made their selection no automata can reliably retrace those steps.

The implications of this are significant. The inventory must continue to support the product URIs used in orders until such time as the order system would never care to dereference those URIs again. If a month from now the user comes back and wants to see their order history, those product URIs had better still work.

Fortunately, HTTP provides us with a ready solution. Behold the awesomeness that is HTTP redirection. HTTP redirection is your best friend when it comes to gracefully changing REST/HTTP applications. Clients get what they need – URIs continue to work as identifiers indefinitely – and servers get what they need – a lot of freedom to change the names and dispositions of resources.

We are still faced with this issue of the transient nature of links. Certainly, many links encode transitions which may be transient. The client has no general way of distinguishing between links which represent transiently available state transitions, and those that represent more permanent transitions.

In our example, immediately after creating an order, it probably provides some links to pay for the order. After the user has provided payment those transition would no longer be valid. However, the link to the inventory product is a more permanent part of the order resource.

The only tractable way i see to deal with this issue is to document the lifespan the various link found in a representation. Once the client implementer understand the semantics of the links they well often be able to infer the likely lifespan of the links without further input. However, guidance can be provided in situations where precision is required or the lifespan is ambiguous. A transient link is, by definition, an option part of the representation so documenting the conditions that cause it to be present is likely to be required anyway.

Best practices: Server

REST/HTTP application developers should assume that clients will store links and dereference them after indeterminate periods of time. When resources are relocated or renamed requests to the resource’s obsolete URI should be redirected to the canonical URI using a 301 Moved
Permanently
response.

For links whose validity has a bounded lifespan the documentation of the representations (the media type) should explicitly layout that the link is transient and optional. If possible the documentation should also describe the conditions of the links existence.

Remind client developers early and often that client must follow any and all redirects from the server.

Best practices: Client

Clients should follow redirects. Fastidiously.

Clients should update it’s internal storage upon receiving a 301
Moved Permanently
response by replace the URI it requested with newly provided location.

Client developers should be aware of transient links in the representations being dealt with. Either do not store these URIs or ensure that attempts so use these URIs handle failure in ways that make sense for the application.

Believe and follow the redirections the server sends to you. Seriously.