Vertical Slicing

I am a fan of polylithic architectures. Such architectures have many advantages related to enhancing evolvability and maintainability. When you decide to create a system composed of small pieces how do you decide what functionality goes into which component?

Principles

The goal is to sub-divide the application into multiple highly cohesive components which are weakly connascence with each other. To achieve the desired cohesion it will be necessary to align the component boundaries with natural fissure points in the application.

The strategy should allow for the production of a arbitrary number of components. A component that was of a manageable size yesterday could easily become too large tomorrow. In that situation the over-sized component will need to be sub-divided. Applying the same strategy repeated will result in a system that is more easily understood.

We want to minimize redundancy in the components. Redundancy results in more code with must be understood and maintained. More importantly redundancy usually introduces connascence of algorithm, making changes more error prone and expensive. In a perfect world, any particular behavior would be implemented in exactly one component.

We want to isolate changes to the system. When implementing a new feature it is desirable to change as few components as possible. Each additional component that must be changed raise the complexity of the change. The componentization strategy should minimize the number of components involved in the average change to the system.

With those metrics in mind lets explore the two most common approaches and see how they compare with each other. Those two patterns of componentization are horizontal slicing and vertical slicing.

Horizontal slicing

In this approach the component boundaries are derived from that implementation domain. The implementation is divided into a set of stacked layers in such a way that a layer initiates communication with the layers below it. This results in a standard layered architectures. By implementing each layer in a separate component you can achieve the horizontal slicing. This style of componentization strategy results in the very common n-tier architecture pattern.

For example, an application that has a business logic and a presentation layer the application would be divided into two components. A business logic component and a presentation component.

Vertical slicing

In this approach the component boundaries are derived from the application domain. Related domain concepts are grouped together into components. Individual components communicate with any other components as needed.

This approach is also quite common but is usually thought of a lot less formally. It is more common for this type of segmentation to develop incidentally. For example, because separate teams developed the parts independently, and then integrated them later. Any time you integrate separate applications you have vertical componentization.

The Score

Against the metrics we laid out earlier, vertical slicing does much better than horizontal.

Horizontal slicingVertical slicing
Cohesionhighhigh
Repeatabilitylowhigh
DRYnesslowhigh
Change isolationlowhigh

Cohesion

Horizontal slicing has high cohesion. Each of the components can represent the a logically cohesive part of the implementation.

Vertical slicing also has high cohesion. Each component represents highly cohesive part of the application domain.

Repeatability

Vertical slicing provides a mechanism for reapply the subdivision pattern an arbitrary number of times. If any component gets too large to manage it can be divided into multiple components based on the application domain concepts. This same process can be repeated from the initial division of a monolithic application until components of the desired size have been achieved.

Horizontal slicing is less repeatable. The more tiers the harder it is to maintain cohesiveness. In practice it is very rare to see an tiered architecture with more than 4 tiers, and 3 tiers is much more common.

DRYness

Horizontal slicing tends to result in some repetition. Certain behaviors will have to be repeated a each layer. For example, data validation rules. You will need those in the presentation layer to provide good error messages and in the business logic layer to prevent bad data being persisted.

Vertical slicing allows you to reduce the connascence of algorithm because any single user activity is implemented in exactly one component. Components usually do end up communicating to each other, however, they do so in a way that does not require in the same algorithms be implemented in multiple components. For any one bit of data or behavior, one component will its authoritative source.

Change isolation

Vertical scaling tends to allow new features to be implemented by changing only one component. The component changed is the one which already contains features cohesive with the new one.

Horizontal slicing, on the other hand, tends to require changes in every layer. The new feature will require additions to the presentation layer, the business logic layer and the persistence layer. Having to work in every layer increase the cognitive load required to achieve the desired result.

Conclusion

Vertical slicing provides significant advantages. The high cohesion, dryness, and change isolation combine to drastically reduces the risks and cost of change. That is turn allow better/faster maintenance and evolution of the system. The repeatability allows you to retain these benefits even while adding functionality over time. Each time a component gets too large you can divide it until you have reach a application size that is human scaled.

Having a large number of components operate as a system does result in a good deal of communication between the components. It important to pay attention to the design of the APIs. Poor API design can introduce excessive coupling which will eat up most of the advantages described above. Hypermedia – or more precisely, following the REST architectural style – is the best way i know to reduce coupling between the components.

Sentence of the day

Anyhow, I’d just conclude by asserting that my new Emacs/Gnus/Org/ERC setup beats my old vim/mutt/nothing/irssi to the death with a baseball bat. :-)

Julien Danjou

Is ruby immature?

A friend of mine recently described why he feels ruby is immature. I, of course, disagree with him. There is much in ruby that could be improved, but the issues he raised are a) intentional design choices or b) weaknesses in specific applications built in ruby. Neither of those scenarios can be fairly described as immaturity in the language, or the community using the language.

Set

Mr. Jones’ main example is one regarding the Set class in ruby. In practice Set is a rarely used class in ruby. I suspect it exists primarily for historical and completeness reasons. It is rather rare to see idiomatic ruby that utilizes Set.1

This is possible because Array provides a rather complete implementation of basic set operations. Rubyist are very accustom to using arrays. So is more common to just use the set operator on arrays rather than converting an array into a sets.

The set operations on Array do not have the same performance characteristics mr. Jones found with Set. For example,

$ time ruby -rpp -e 'pp (1..10_000_000).to_a & (1..10).to_a'
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

real    0m10.152s
user    0m6.592s
sys 0m3.515s

$ time ruby -rpp -e 'pp (1..10).to_a & (1..10_000_000).to_a'
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

real    0m12.410s
user    0m8.397s
sys 0m3.860s

Order still matters, but very much less. (That is on 1.8.6, the only version i have handy at the moment. I am sure that 1.9, or even 1.8.7, would be quite a bit faster.)

Libraries that are low traffic areas don’t get the effort that high use libraries do in any language. Even though Set is part of the standard library, it is definitely counts as a low traffic area. Hence, it has never been optimized for large numbers of items. This is appropriate because as we learned from Ron Pike “n is usually small”. The benefits of handling large sets performantly is not worth the addition complexity for a low traffic library.

nil

In his other example mr. Jones implies that the fact that nil is a real object is disadvantageous. On this count he is simply incorrect. Having nil be an object allows significant reductions in the number of special cases that must exist. This reduction in special cases often results in less code, but is always results in less cognitive load.

Consider the #try in ruby. While not my favorite implementation of this concept, it is still a powerful idiom for removing clutter from the code.

#try executes the specified method on the receive, unless the receiver is nil. When the receive is nil it does nothing. This allows code to use a best effort approach to performing non-critical operations. For example2,

def remove_email(email)                                                                                         
  emails.find_by_email(email).try(:destroy)                                                                     
end  

This is implemented as follows:

module Kernel
  def try(method, *args, &block)
    send(method, *args, &block)
  end
end

class NilClass
  def try(*args)
    # do nothing
  end
end

You could implement something like #try in a system that has non-object “no value” mechanism. It would be less elegant and less clear, though. (It would probably be less performant too because method calls tend to be optimized rather aggressively.) Have nil be an object like everything else is one less the primitive concept that the code and the programmer must keep in mind.

Mr. Jones does bring up the issue of nil.id returning 4 and that value being used as a foreign key in the database. This is not a problem i see very often, but i can happen.

This is definitely not a problem with ruby. Rather results from an unfortunate choice of naming convention in rails. Rails uses id as the name of the primary key column for database tables. This results in an #id method being created, which overrides the #id provided by ruby itself for all objects. If rails had chosen to call the primary key column something that did not conflict with an existing ruby core method – say pk – we would not be having this discussion.

In general

Mr. Jones asserts that “ruby is rife with happy path coding”. I disagree with his characterization. The ruby community has a strong bias towards producing working, if incomplete code, and iterating on that code to improve it. This “simplest thing that could work” approach does result in the occasional misstep and suboptimal implementations. In return you get to use a lot of new stuff more quickly and when there are problems they are easier to fix because the code is simpler.

The ruby community has strongly embraced the small pieces, loosely joined approach. This is only accelerating the innovation in ruby. Gems have lowered the fiction of distributing and installing components to previously unimaginable levels. This has allowed many libraries that would have been to small to be worth releasing in the past to come into existence.

Rack, with it’s middleware concept, is an example of the ruby community taking much of the Unix philosophy and turning it to 11. While rails has much historic baggage, even it is moving to a much more modular architecture with the up coming 3.0 release.

Following these principles does result in some rough edges occasionally, but the benefits are worth the trade. The 80% solution is how Unix succeed. An 80% solution today is better than a 100% solution 3 months from now. (As long as you can improve it when needed.) We always have releases to get to, after all.


  1. I, on the other hand, do use set rather more than the average rubyist. Set is a rather performant way producing collections without duplicate entries.

  2. Shamelessly copied from Chris Wanstrath.

“life elevated”

That is utah’s slogan, apparently,which is where we are today. We spent the last few days at the grand canyon and lake powell. Both are awe inspiringly beautiful. So much so that I will skip posting the completely inadequate pictures my phone captured.

Elliot and Audrey are keeping travel journals. So far Elliot has ended every entry with, “it was big.” The grand canyon definitely fits that description.

I recommend the fossil walk, guided by a ranger, at the grand canyon. It is really cool to find fossils for yourself. Audrey particularly enjoyed finding and keeping count of the fossils. Perhaps she really will grow up to be paleontologist. (She is fond of claiming that as a future occupation.)

For all it’s grandeur, i am pretty sure the kids enjoyed swimming in lake powell far more. I understand that reaction. It is hard to beat cool water and a sandy beach in heat of the desert.

Petrified forest

Today we visited petrified forest national park today. We started in the painted desert area of the park. What desolate, beautiful landscape.

After that we moved on to the petrified wood portion of the day. That stuff is just cool. It is amazing how wood like the permineralize type is. The fully petrified type is really pretty.

The kids got their first jr ranger badges. At each national park kids can do some activities in the park and earn a badge and a patch for that park. It is a great way keep the kids engaged.

I recommend “Here comes science” by They might be giants for your next road trip. It is excellent driving music. Oh, and the kids like it too.


Sweet vacation

Vacation with the family has begun. I am very excited. We are going to see tons of interesting things in the southwest US over the next week.

Managing oss contributions with Git and Ruby Gems

Once you start using opensource at your day job you are going to want to improve it. Many improvements are going to be generally useful and should be contributed back to the community. A few of these changes may be quite specific and of no value to the community at large.

Changes that are generally useful should be contributed back to the community. This will help the community and help you. Every change to an opensource project you maintain raises the cost of keeping your modified variant current. Once a change you need is in the mainline project your company no longer has to maintain it alone.

Regardless of the generality of the changes you are going to want to put them in production quickly. Waiting for a change you’ve made to be integrated and released by the opensource project before putting it in production is probably not going to be an option. What ever problem caused you to make the change needs solving and soon. It could take weeks or months before even a good change is merged in to the mainline of an opensource project.

Ruby, gems and git

A distributed version control system is key to the approach i use. Git is my preferred tool, but any DVCS would work. GitHub really makes this a lot easier than it would be otherwise.

I work mostly in Ruby so i am going to describe that workflow. Gems, particularly with the introduction of the new rubygems.org, really lower the bar for releasing Ruby software. The low effort required to do so can really make managing your corporate opensource contributions easier. A similar approach could be made to work for other release mechanisms.

What you will need

Making changes

Before getting started on the actual change there are some setup steps you need to perform. Namely, creating a version of this project specific to your company and a repo that allows you to publish the changes back to the community. These steps only need to be done once per opensource project.

Create a public repo for your changes

  1. Fork the canonical repo of the opensource project into your Github account.

  2. Clone your fork onto your machine

    $ git clone {private URI of your repo}

Create a company specific version of the project

  1. Fork the canonical repo of the opensource project into your companies GitHub account.

  2. Add yourself as a contributor to your companies repo.

  3. Add an ‘foocorp’ remote to you local repo pointing to your companies fork of the opensource project.

    $ git remote add foocorp {private URI of your companies repo on github}
  4. Create a ‘foocorp-stable’ branch.

    $ git checkout -b 'foocorp-stable'
  5. On the ‘foocorp-stable’ branch, change the name of the gem to ‘foocorp-projname’.

    $ (edit gemspec or gemspec generator)
    $ git commit -m "Company specific Gem name"
    $ git push foocorp foocorp-stable

You have created a version of this project whose gem has your companies name prepended. This will be useful later as a way to release the changes you need before they have been integrated into the opensource project. However, this change is only your companies stable branch. This branch will never be integrated into the opensource project.

Making the change

  1. Create a feature branch for your change in you local repo.

    $ git checkout -b 'super-feature'
  2. Implement the wicked new feature/fix the bug.

    $ (do work)
    $ git commit -m "my feature is super"
  3. Push the feature branch to your GitHub repo.

    $ git push origin super-feature
  4. Push the feature branch to your companies GitHub repo.

    $ git push foocorp super-feature
  5. Merge your feature branch into the ‘foocorp-stable’ branch.

    $ git checkout foocorp-stable
    $ git merge super-feature
  6. Push ‘foocorp-stable’ branch to you companies GitHub repo.

    $ git push
  7. Bump the version number as appropriate.

  8. Build the gem from the ‘foocorp-stable’ branch.1

    $ rake build
  9. Push the gem to rubygems.org.

    $ gem push pkg/{gem file}
  10. Change your application to require the ‘foocorp-projname’ gem instead of ‘projname’.

  11. Send pull request to the opensource project for you feature branch.

The end result is that you have a published gem with the changes you need to support you application. This gem can be installed using the normal gem command. Your new gem boasts a name that will keep it from being confused with the original. The changes you implemented are available to the opensource project for the benefit of the community at large.

Once your changes have been integrated into the opensource project and released you can revert your application to depend on the canonical variant rather than your custom version.

Q & A

Why use two separate repos on GitHub (your’s and your company’s) to manage changes?

Your companies GitHub account will probably have it’s email address setup to point to a distribution list. Getting a change integrated into an opensource project can take some back and forth. By default, responses to a pull request go to the email of the account that sent to pull request. This means that your whole team will be getting these emails. As the original author of the change it is your responsibility to shepherd it through the integration. Preferably without barraging the rest of your team with emails they don’t care about.

Why create a feature branch?

Because it is a lot easier for maintainers to merge a feature branch containing a limited cohesive set of changes. It is a little bit more of a pain for you but your changes will get integrated faster and more reliably. The opensource maintainers will thank you.

What about non-generic changes?

Follow the same process above except don’t send the pull request. If you ever want changes from the mainline opensource project you will need to merge those into your companies stable branch explicitly. However, this is pretty easy to do with Git.

What if the opensource project adds a change i want before my changes are integrated?

Just merge the opensource project’s release tag (or any commit-ish for that matter) containing the change you want into your companies stable branch. You can do this as many times as needed.

Why not modify the version of the gem rather than the name?

You could distinguish your custom gem by appending suffix to the version. For example, ‘1.2.3.foocorp’. However, doing so would prevent you from pushing your gems to rubygems.org because someone else already owns that gem. It also prevents rational versioning for your gem. The versioning issue is important as you might want to make multiple independent changes to your gem.

Conclusion

Using the technique described above you can very effectively manage changes to opensource projects that are required by your applications. Contributing your changes back does require maintaining and merging multiple code streams. This can be somewhat convoluted at time but DVCSs allows a much more efficient approach than has ever been possible before.


  1. This example assumes you are using jeweler. If the opensource project is not using jeweler build the gem in whatever way the project supports.

Why MySQL Isn’t My Favorite RDBMS — Reason #767

Mysql::Error: Specified key was too long; max key length is 767 bytes: CREATE UNIQUE INDEX …

Seriously!? ‘Cause no one would ever want a unique constraint on medium sized varchar column.

Things i never thought i would say

it only shaved a second off the response time so i reverted it

It is not often that a full one second improvement to the response time of an HTTP request so insignificant that it is not worth committing.

What are links

When designing hypertext formats is it better to provide links for every available action or to provided links to related resources and let the client use the protocol interface to achieve particular actions on those related resources?

I have leaned in both directions at various times. I have never fully convinced myself either.

To make the issues a bit clearer let me use and example lifted from the article that got me thinking about this most recently.1

<cart>
  <!-- some stuff here -->
  <link rel="http://ex.org/rel/abort" 
        href="http://ex.org/cart/cancel;token=987654321"/>
  <link rel="http://ex.org/rel/add-more" 
        href="http://ex.org/cart/add;token=987654321"/>
  <link rel="http://ex.org/rel/buy" 
        href="http://ex.org/cart/buy;token=987654321"/>
</cart>

I place this example in the “links for every action” camp. Each of the links in the example describes exactly one action.

An alternate approach might look something like this.

<cart>
  <!-- some stuff here -->
  <link rel="http://ex.org/rel/line-items" 
        href="http://ex.org/cart/line-items;token=987654321"/>
  <link rel="http://ex.org/rel/new-order" 
        href="http://ex.org/orders?cart=987654321"/>
</cart>

From a client perceptive these are a bit different.

Abandoning cart
A client that wants to abandon a cart in the first example would make a DELETE or POST – it’s a bit hard to tell which from the example – request to the href of the http://ex.org/rel/abort link. In the second example a similar client would just DELETE the cart resource.
Adding item
When adding an item in the first example the client would post a www-form-urlencoded document containing the URI of the item to add and the quantity to the href of the http://ex.org/rel/add-more link. In the second example, the same document gets posted to href of the http://ex.org/rel/line-items link.
Placing order
In the first example the client would make a POST request to the href of the http://ex.org/rel/buy link. In the second example the client would POST a www-form-urlencoded document containing the cart URI and some payment information to the href of the http://ex.org/rel/order link.

Differences

Obviously the to approaches result in quite similar markup. The same behavior is encoded in both. In the first example the links are action oriented. All actions that can be taken on an item are explicitly stated using a link. In the second approach the links are data oriented rather than action oriented. Rather than having separate links to retrieve the current line items and to add a new line item the http://ex.org/rel/items link provide both actions using the GET and POST HTTP methods respectively.

The first approach it better at expressing what actions are allowable at any given point in time. For example, once the purchase process has been initiated it does not make sense to abort a cart. So if you GET a cart after POSTing to the http://ex.org/rel/buy link the representation would not have the http://ex.org/rel/abort link.

The second is more concise because it, at least potentially, provides access to more than one action per link based on the standard HTTP methods. You don’t need to provide a separate abort link because DELETEing the cart is sufficient. You don’t need to provide separate get line items and add line item links because a single link that can handle GET and POST requests will work.

The first approach is a bit more flexible with regard to implementation details. If you need for some reason to have different URIs for the retrieve line item request than the add line item request you could easily achieve it. The second example makes that impossible.

Conclusion

I am still not entirely convinced but i am leaning toward the more flexible, verbose and explicit approach of a link for every actions.2 Having links represent actions rather than resources feels a bit odd, but i think it provides more of the benefits we hope to get from a RESTful architecture.


  1. I am still not a fan of the link element. This example is a good one in every other regard.

  2. That counts as at least the third vacillation i have had on this topic. I was leaning the other direction before writing this.