Mr Whitney recently posted an article in which he described mock objects as “bug aggregators”. I once held a similar point of view. Back then my belief was that test doubles (mock, stub, etc) should only be used when real objects would not work, either because they were too hard to setup or because they were too slow. Recently, however, my thinking towards mocks has become a bit more nuanced1.

A couple of months ago my team switched from Test::Unit to RSpec2 for our system validation needs. As part of that switch I read more deeply into the Behavior Driven Development(BDD) movement, which is where tools such as RSpec originate. BDD is, in many ways, just a collection of developer testing best practices combined with much better terminology and some wicked tools (like RSpec). While that may not sound like much it is the biggest improvement in the testing ecosystem since the emergence of systematic automated unit testing.

One of the things that the BDD promotes is the heavy use of test doubles to isolate the tested class from the rest of the system. BDD takes this to such an extreme that Martin Fowler has refers to practitioners of BDD as “mockist”. So when we switched to RSpec I decided to set my previously held notions aside and to give full fledged BDD a try3.

I have been mocking and stubbing my heart out for a couple of months. During that time I have reached the conclusion that test doubles can be very useful as design tools. At the same time they can, as Mr Whitney points out, hide some really nasty bugs.

What are tests for, anyway?

The tension I feel around test doubles comes from the fact that automated unit tests, or executable specification, serve multiple purposes. These purposes are almost never aligned. Improving a test for one purpose will reduce is usefulness for other purposes. As a bare minimum tests serve these purposes:

* tests verify that the code does what you intended * tests provide a set of examples for how to use a class * tests allow for comfortable evolution of the system by protecting against incompatible changes in the future * tests act as a proving ground for the design

While it is the least obvious, experienced practitioners of TDD/BDD often cite that last purpose as one of the most important. Tests have an amazing way of surfacing poor design choices, and test doubles magnify this. You quickly notice interfaces that are excessively complex when you have to mock and stub all the method calls involved. Hard to mock interfaces are a design smell.

Extensive use of test doubles radically enhances the quality of tests as tool for validating the design. At the same time degrades the usefulness of the tests for protecting you from future incompatible changes. I don’t think test doubles aggregate bugs but they are great at hiding them. If you use test doubles extensively you will, quite regularly, end up with two classes that do not work correctly with one another in the real system, but who’s tests pass. On the other hand, those classes will be better designed than if you were just using real objects.

In some languages (Java springs to mind) heavy use of test doubles might also degrade the quality of the tests as examples. If you are using a mocking system that requires a lot of code to mock an object it might tip the against mocking except in extreme cases. However, in Ruby this is not really an issue. The mock/stub library that comes with RSpec has such a clean and pretty syntax that it takes almost nothing away from the readability of the tests.

To counteract the degradation of tests as barrier to the code being inadvertently broken in the future, mockists usually suggest adding higher level of tests that are more about testing the system as a whole rather than the individual parts, or acceptance testing. These sorts of test are usually a good idea in their own right, but they also free you up to make more aggressive use of test doubles to improve the design of your code.

It’s all about tradeoffs

As with most engineering decisions, how much to use test doubles boils down to a set of trade-offs. You have to decide which of the uses of tests are most important for your project and then try to optimize along those dimensions. My currently feeling is that the design benefits of using test doubles usually outweighs the costs. Only you and your team can decide if that is true for your project, but you will never know until you give them a real try.

  1. In this case, as in most, “more nuanced” is a just a euphemism for “I am not sure what the best course of action is”.

  2. If you are doing ruby development you should be using RSpec. It is what Test::Unit, and all the other xUnit frameworks, should have been.

  3. For me, the best way to find the limits of something new is just to use it as aggressively as possible until I start finding situation where it does not work. Once it starts breaking down in real life you usually end up with a really good feel for the limits.