Test Anti-Pattern: Proving The Code is Written Like the Code Is Written

Testing is suppose to verify that code does what we want. However, I have seen many “unit” tests that spend a lot of time verifying that code works how they wrote it, rather than verifying that it achieves the desired result. The worst cases of this I have seen involve the use of mock objects. This is an real example (I changed the names to protect the guilty) of that anti-pattern (using Java and EasyMock):

public void testDoSomthingUseful()
  MockControl mockObjToTestControl = MockClassControl.createControl(ClassToTest.class);
  ClassToTest mockObjToTest = (ClassToTest)mockObjToTestControl.getMock();

  ClassToTest objToTest = new ClassToTest()
    String doSubtask1()
      return mockObjToTest.doSubtask1();

    String doSubtask2()
      return mockObjToTest.doSubtask2();




As you can see this code proves that the doSomethingUseful() calls do doSubtask1() and doSubtask2(). The problem is that this is a total useless test. The only thing it proves is that the code is written the way the code is written. There are not even any assertions that the desired result as achieved. Even if there were some assertions made about the the results of doing something useful you still have a test which is completely and absolutely tied to the current implementation of the doSomethingUseful() method.

Tests should check for the desired behavior1 and avoid checking implementation details. Our tests should not care if next week the doSomethingUseful() method is refactored such that it does not call doSubtask1() as long as it still achieves the same result. The above example is an extreme case and very few people write tests as pathologically broken as that. However, a disturbing number of people write tests with little thought to whether their are verifying desired behavior or implementation details.

A key way to encourage this behavior oriented approach is to only use an objects, globally, public interface in the tests. One objection I have heard to this approach is that you will miss bugs because you correctness checks will, inevitably, be made at a higher level. This argument is just wrong. Any bug that is undetectable from the public interface does not matter. Of course, detecting some bugs via the public interface might be a bit more difficult than immediately reaching into the guts of your objects but it will be better because you will be testing the intended behavior of those objects, not their implementations. As a side benefit you will be implicitly documenting the public interface with is a Good Thing.

As with any rule there are a few situation in which I do write tests against non-public parts of an objects interface. For example, if you have a non-public method that you suspect is likely to break (e.g. it embodies a non-trivial algorithm or involves non-trivial interaction with other components) and it is difficult to diagnose it’s failures from the public interface. However, I consider this confluence of situations to be extremely suspect. It will most often result from sub-optimal design, better solved by refactoring than than by writing tests against the non-public interface.

1. Dave Astels is working on something he calls BDD (these slides are good too). I think he has got it just about right and I am looking forward to trying out RSpec real soon now.

What Is A “Unit” Test

I think that one reason unit testing is such a big win is that it integrates design verification into the design (read: coding) process. (See Jack Reeves’ Code As Design.) This connection has made me start to pay a little more attention to the unit tests I see around me and I have found some odd ideas about what is a unit test vs a functional test. There is quite a lot of room for interpretation here because the name is, probably intentionally, vague. These ideas, that I think are odd, have caused me to refine my idea of what a unit test is.

One of the odd ideas is that a unit test should only test a single class. In some ways this seems to be reasonable. This is basically the unit test orthodoxy that I originally learned, though perhaps taken to an extreme. However, I think this view of unit tests is not internally consistent. Even simple unit tests test how different classes fit together. When is the last time you mocked the String, List or other basic classes that come with the language you are using? No one does that because it is not worth the effort. But by not doing that you are inherently not testing a single class, you are testing that a set of classes work together correctly.

REXML could not parse this XML/HTML: 
<p>The other odd idea is that a test that interacts the outside world &#8212; the file system, network, etc. &#8212 should be called "functional" tests rather than unit tests.  I am a firm believer that names are, almost always, important.  The impact of calling the tests functional, rather than unit, is that they do not get run nearly as often.  This happens because the "functional" tests often require significant setup.  So everyone basically avoids running the functional tests that are not directly related to the code on which they are working, even if some portion of them are easy to run.</p>

I dislike both of the above ideas. In a perfect world, I think unit tests would test every possible use of every bit of code in the system and they would be run before every check-in. This is obvious unrealistic but I think you end up at a better place if you start at what you would really like, and then remove only the parts that are not feasible. There are two main obstacles to my perfect world unit tests. First, they would take a long time for any non-trivial application. Worse yet, some tests are by definition long running and you do not want to run those tests every time you check-in. Second, some of the tests you would like require an environment that is difficult to recreate programmatically, for example if you need a DB in known state, middle-ware that only your test is using, etc.

TestNG has a nice solution for long testing times. (There is a lot about TestNG that concerns me but this feature is cool.) You can annotate your tests in such a way that you can exclude tests which take a long time from any particular test run. Those long running tests can still be unit tests, just like all the other unit tests, but they can be excluded when it is important for the testing to complete quickly. In other frameworks you can achieve the same behavior by sequestering your long tests in particular fixtures which are not included in the “normal” test suite, but are run by your automated build system.

The second issue is the one concerns me the most. This is because running these tests is not just a matter of waiting for them to finish but rather knowing how to setup the environment such that they will work at all. Normally these tests are off loaded to a QA or test team. This means that you have moved the verification of the design to a team outside of your control. This strikes me as sub-optimal because is lengthens the time between defect introduction and detection. Unfortunately, I do not have a good solution for this problem. In most environments it is more acceptable to have a QA team run these sorts of tests than to have the developers spend a lot of time setting up and maintaining multiple independent environments for tests purposes. Joel points out that in some situations VMware may be a good solutions to this problem.

For me, the difference between a unit test and a functional test has nothing to do with what is tested, but rather when the test is run. If it is primarily run as part of coding it is a unit test. If it is primarily run outside of the coding process it is a functional test. The sooner you can find a flaw in the design the better so it is better to do as much testing as possible as part of coding, so I think all levels of software should be tested in unit tests. There should be fixtures that test individual classes, the interactions of classes in individual packages and the interactions of classes across packages. These fixture should do what ever is necessary to support the tests, including writing to the file system or network. All of those tests go into the main test suite unless one of the following is demonstrably true.

  • It is not feasible to setup the appropriate environment programmatically
  • The test would take an unreasonable amount of time to execute

Those two criteria are a bit vague, and intentionally so. I think that every project has to decide for itself what feasible and reasonable. I can say that for the projects on which I have worked, the execution time of the tests has been completely immaterial, with the exception of load tests.