Mr Whitney recently posted an article in which he described mock objects as “bug aggregators”. I once held a similar point of view. Back then my belief was that test doubles (mock, stub, etc) should only be used when real objects would not work, either because they were too hard to setup or because they were too slow. Recently, however, my thinking towards mocks has become a bit more nuanced1.

A couple of months ago my team switched from Test::Unit to RSpec2 for our system validation needs. As part of that switch I read more deeply into the Behavior Driven Development(BDD) movement, which is where tools such as RSpec originate. BDD is, in many ways, just a collection of developer testing best practices combined with much better terminology and some wicked tools (like RSpec). While that may not sound like much it is the biggest improvement in the testing ecosystem since the emergence of systematic automated unit testing.

One of the things that the BDD promotes is the heavy use of test doubles to isolate the tested class from the rest of the system. BDD takes this to such an extreme that Martin Fowler has refers to practitioners of BDD as “mockist”. So when we switched to RSpec I decided to set my previously held notions aside and to give full fledged BDD a try3.

I have been mocking and stubbing my heart out for a couple of months. During that time I have reached the conclusion that test doubles can be very useful as design tools. At the same time they can, as Mr Whitney points out, hide some really nasty bugs.

What are tests for, anyway?

The tension I feel around test doubles comes from the fact that automated unit tests, or executable specification, serve multiple purposes. These purposes are almost never aligned. Improving a test for one purpose will reduce is usefulness for other purposes. As a bare minimum tests serve these purposes:

* tests verify that the code does what you intended * tests provide a set of examples for how to use a class * tests allow for comfortable evolution of the system by protecting against incompatible changes in the future * tests act as a proving ground for the design

While it is the least obvious, experienced practitioners of TDD/BDD often cite that last purpose as one of the most important. Tests have an amazing way of surfacing poor design choices, and test doubles magnify this. You quickly notice interfaces that are excessively complex when you have to mock and stub all the method calls involved. Hard to mock interfaces are a design smell.

Extensive use of test doubles radically enhances the quality of tests as tool for validating the design. At the same time degrades the usefulness of the tests for protecting you from future incompatible changes. I don’t think test doubles aggregate bugs but they are great at hiding them. If you use test doubles extensively you will, quite regularly, end up with two classes that do not work correctly with one another in the real system, but who’s tests pass. On the other hand, those classes will be better designed than if you were just using real objects.

In some languages (Java springs to mind) heavy use of test doubles might also degrade the quality of the tests as examples. If you are using a mocking system that requires a lot of code to mock an object it might tip the against mocking except in extreme cases. However, in Ruby this is not really an issue. The mock/stub library that comes with RSpec has such a clean and pretty syntax that it takes almost nothing away from the readability of the tests.

To counteract the degradation of tests as barrier to the code being inadvertently broken in the future, mockists usually suggest adding higher level of tests that are more about testing the system as a whole rather than the individual parts, or acceptance testing. These sorts of test are usually a good idea in their own right, but they also free you up to make more aggressive use of test doubles to improve the design of your code.

It’s all about tradeoffs

As with most engineering decisions, how much to use test doubles boils down to a set of trade-offs. You have to decide which of the uses of tests are most important for your project and then try to optimize along those dimensions. My currently feeling is that the design benefits of using test doubles usually outweighs the costs. Only you and your team can decide if that is true for your project, but you will never know until you give them a real try.

  1. In this case, as in most, “more nuanced” is a just a euphemism for “I am not sure what the best course of action is”.

  2. If you are doing ruby development you should be using RSpec. It is what Test::Unit, and all the other xUnit frameworks, should have been.

  3. For me, the best way to find the limits of something new is just to use it as aggressively as possible until I start finding situation where it does not work. Once it starts breaking down in real life you usually end up with a really good feel for the limits.

9 thoughts on “Mocking

  1. Test frameworks were invented for young folks that don’t routinely test code as it is written. Us oldtimers usually write code correctly and that is largely due to the fact that each function point is tested as it is created. Someone had to do something about the kids that kept writing code that did not function correctly. So now we all have to live with formal test software. Of course IT managers and team leads never did understand the problem but the cs media and vendors convinved them that they needed these test tools to solve the problem.

  2. Someone take Dick out and make him maintain a system with a few million lines of (COBOL) code and 1980s (mainframe) testing tools & methods.

    Just watch the backlog grow and the code base degrade! Of course any organization, like that, that is still alive today, is in a monopoly situation. Like big govt departments and big oil.

    Unless your in a monopoly (natural or unnatural), if you don’t automate a much as possible, someone (be it down the street, in the next state or India), is going to eat your lunch at some stage.

  3. Actually, I have maintained several million line Cobol systems back in the day. However, “write using record” usually solved the problem. The situation was improved somewhat with the development of Endevor, etc. and search tools. And, incidentally, we managed to keep these systems up and running, even the 7/24 beasts at MCI in COS that supported the base telecommunications network. I just find it a pain in the butt to write formal test cases when my code just about always works doing test as you go.

  4. I’ve work on a number of these sites, in recent years, and they are still using 80s tools and methods today!

    Endeavor sucks and it is one of those 80s tools I was talking about! Worked with Endeavor as late as a year ago. It one bit of the solution. On the mainframe, most bits are still missing.

    We can keep these old, degrading systems in the ‘air’, but the backlogs (often hidden) and the time it takes to get a change through the system both become longer & longer.

    It’s been said that > 95% of all problems are management problems. An organization that gets itself into this state has only its management to blame.

    Organizations in this technological state tend to have other management problems too. Arse covering, blame games, miles of red tape, the list goes on. These all create wasted time/labour & looooong feedback cycles.

    The doers on the floor have to live with the consequences.

    What I get sick of is finding broken corner cases. Writing a fix. Writing a detailed test setup for the Test team. You know that once the fix is done, that the test case & all that supporting work will never get read or used again (due to time/labour costs, because its not been automated). Some future change could break it again and no one will know until the system breaks or the database is corrupted or a customer complains or …

    The result of any design process is a set of unvalidated design decisions.

    The other half of the process is design validation. In other engineering discipline, it’s call analysis. IT uses this term for something else. Until they develop mathematical proof methods that are computationally cheap. The best validation process is testing. The cheapest (cost per individual test execution) testing system is an automated one.

    Mocking is something that can greatly reduce time costs. This means the revalidation of a design can be done more often. The closer the revalidation is to the point of design change, the better & faster the design feedback loop works.

    I, as a professional, have a responsibility to the shareholders/tax payers (and other stake holders) to ensure that the money invested in the development process, is invested in a best way possible. That means looking a both the short & longer term returns that can be gained from the different ways of doing me job.

  5. I will add, due to the age/nature of COBOL (things like granularity, global only memory model etc), these missing tools will likely never exist.

    I was to use a language that is younger than I am!

  6. gnoll,

    You express a lot of true ideas in regard to the life of a mainframe programmer! Cobol was the programming language of the masses back then. I used to write a lot of Pl/i/Asm at IBM. With Pl/i, we could develop external subroutines that were reusable and were linked into the manager module. So, we could unit test each subroutine with a “main” stub. After that, we hooked the validated subroutines into the actual mainline and performed integration test. Pl/i, with pointers and dynamic memory allocation, was the Cadillac of programing languages. Only an elite few mastered this language. And ASM(BAL) was used for the hot spots, in terms of performance, in a Pl/i application system.

Comments are closed.