Article Review: Why Most Unit Testing is Waste

Andriy Drozdyuk
12 min readAug 24, 2016

--

In this article I record some of the excerpts from the article Why Most Unit Testing is Waste by James O Coplien. I encourage you to read the article first, and come back here for some of my comments on it or as a quick reference to some of what I consider the most interesting points.

Part 1.1

So, testing became in again. And it was unit testing with a vengeance.

Just a beautiful sentence.

Design became much more data-focused because objects were shaped
more by their data structure than by any properties of their
methods. (p.2)

True! Can you recall how many times you started the conversation with “the database should store this and that”?

The lack of any explicit calling structure made it
difficult to place any single function execution in the context of
its execution. What little chance there might have been to do so
was taken away by polymorphism. (p.2)

Imagine two sub-classes of an abstract class - you can never be sure until run-time which of them will be chosen. Now you have to look in two places, instead of just one, to see the possible paths of the program.

Classes additionally became the units of administration, design
focus and programming, and their anthropomorphic nature gave
the master of each class a yearning to test it. And because few
class methods came with the same contextualization that a
FORTRAN function did, programmers had to provide context
before exercising a method (remember that we don’t test classes
and we don’t even test objects — the unit of functional test is a
method). (p.2)

The rise of mocking is not a coincidence, as the author explains:

Unit tests provided the drivers to take methods through their paces. Mocks provided the context of the environmental state and of the other methods on which the method under test depended. And test environments came with facilities to poise each object in the right state in preparation for the test. (p. 2)

Part 1.2

Unit testing is of course not just an issue in object-oriented programming, but the combination of object-orientation, agile software development, and a rise in tools and computing power has made it de rigueur. (p. 3)

It is the combination of multiple forces that made unit testing rise to power.

You’ll remember from your trade school education that you can model any program as a Turing tape, …

I would like to put this forth as the requirement #1 for anyone advocating unit testing: know what a Turing Machine is. Why? Since you won’t understand this next bit otherwise:

and what the program can do is somehow related to the number of bits on that tape at the start of execution. If you want to thoroughly test that program, you need a test with at least the same amount of information: i.e., another Turing tape of at least the same number of bits. (p. 3)

In Cybernetics, this is called the Law of Requisite Variety which can be roughly stated as “Only complexity can defeat complexity”.

Clearly then, to test the full program and achieve complete testing, one would need the same amount of code as the thing being tested. However, things are much worse than that:

In real practice, the vagaries of programming language make it difficult to achieve this kind of compactness of expression in a test so to do complete testing, the number of lines of code in unit tests would have to be orders of magnitude larger than those in the unit under test. Few developers admit that they do only random or partial testing and many will tell you that they do complete testing for some assumed vision of complete. (p. 3)

Following is a nice bit of advice that should be printed above every developer’s door:

Be humble about what your unit tests can achieve, unless you have an extrinsic requirements oracle for the unit under test. Unit tests are unlikely to test more than one trillionth of the functionality of any given method in a reasonable testing cycle. Get over it.

With an amendment for non-believers:

(Trillion is not used rhetorically here, but is based on the different possible states given that the average object size is four words, and the conservative estimate that you are using 16-bit words). (p.4.)

Part 1.3

Another piece of advice for not splitting up your algorithms simply to satisfy unit testing compliance:

If you find your testers splitting up functions to support the testing process, you’re destroying your system architecture and code comprehension along with it. Test at a coarser level of granularity. (p. 5)

Another reason why unit testing is infeasible is that it is simply impossible to test all iterations of some thing:

To test any reasonable combination of loop indices in a simple function can take centuries. (p. 5)

Author criticizes the notion that automating unit tests make them any better. The excerpt also contains a reference of interest to potentially read up on:

Remember, though, that automated crap is still crap. And those of you who have a coroporate [sic] Lean program might note that the foundations of the Toyota Production System, which were the foundations of Scrum, were very much against the automation of intellectual tasks... (p. 6)

An alternative advocated is to test at the system boundary (which is in my mind consistent with System Theory — where one does not care about the internal structure of the thing, but only about its behavior):

… formal test design: that is, to do formal boundary-condition checking, more white-box testing, and so forth. That requires that the unit under test be designed for testability. This is how hardware engineers do it: designers provide “test points” that can read out values on a J-Tag pin of a chip, to access internal signal values of the chip — tantamount to accessing values between intermediate computations in a computational unit. I advocate doing this at the system level where the testing focus should lie; I have never seen anyone achieve this at the unit level. Without such hooks you are limited to black-box unit testing. (p. 6)

The final bit of advice offered in this part is as follows:

Tests should be designed with great care. Business people, rather than programmers, should design most functional tests. Unit tests should be limited to those that can be held up against some “third-party” success criteria. (p. 6)

Which is correct as well in my opinion. Having business people design tests also forces the developer to think at the system level, as no business person in the world would ever come up with a test case like “Given that I issue a Select statement to the database, it should return a list of rows”.

Part 1.4

An interesting point about why programmers actually think test first is a good practice (which I always found odd):

Programmers have a tacit belief that they can think more clearly (or guess better) when writing tests when writing code, or that somehow there is more information in a test than in code. (p. 7)

Author suggests that programmers use unit tests as a way of liberating themselves from the various restrictions imposed upon them:

Or it may be that your process makes it impossible to integrate frequently, because of bad process design or bad tools. The programmers are doing their best to compensate by creating tests in an environment where they have some control over their own destiny. (p. 7)

The other side of the coin is that maybe developers are not very good problem solvers:

Or the problem may be at the other end: developers don’t have adequately refined design skills, or the process doesn’t encourage architectural thinking and conscientious design. (p. 7)

That is not to say that this situation is a bad thing. For example, such developers could be put on the Query side duty in the CQRS architecture while some of the more complex problems should be assigned to more experienced programmers.

The next part is so true it hurts:

If you have comprehensive unit tests but still have a high failure rate in system tests or low quality in the field, don’t automatically blame the tests (either unit tests or system tests). Carefully investigate your requirements and design regimen and its tie to integration tests and system tests. (p. 8)

If I had a penny for every time I heard we need “better unit tests” or “more unit tests” when our system was failing or slow I would probably have a couple of hundred dollars by now!

Part 1.5

On the fact that testing does not increase quality:

The purpose of testing is to create information about your program. (Testing does not increase quality; programming and design do. Testing just provides the insights that the team lacked to do a correct design and implementation.) (p. 8)

How much information do we get out of a series of tests that always passes
(1 — means success, 0 — means failure):

11111111111111111111111111111111

there is no information — by definition, from information theory. (p. 9)

On why developers like to keep useless tests around:

If we can’t predict at the outset whether a test will pass or fail then each test run contains a full bit of information, and you can’t get better than that. You see, developers love to keep around tests that pass because it’s good for their ego and their comfort level. (p. 9)

The following coincides with my view on old tests (or tests that are so good they always pass):

If you want to reduce your test mass, the number one thing you should do is look at the tests that have never failed in a year and consider throwing them away. They are producing no information for you — or at least very little information. The value of the information they produce may not be worth the expense of maintaining and running the tests. This is the first set of tests to throw away — whether they are unit tests, integration tests, or system tests. (p. 9–10)

After a funny anecdote, where someone wrote unit tests which didn’t need to be rewritten when the code was changed (tests were smart, you see), the author provides the criteria for another set of tests to throw out:

The third tests to throw away the tautological ones. I see more of these than you can imagine — particularly in shops following what they call test-driven development. (p. 10)

Indeed, most tests programmers come up with have rather little to do with business logic of the system:

In most businesses, the only tests that have business value are those that are derived from business requirements. Most unit tests are derived from programmers’ fantasies about how the function should work: their hopes, stereotypes, or sometimes wishes about how things should go. (p. 10)

Author then hammers down this point by providing the last criteria for throwing out the test:

So one question to ask about every test is: If this test fails, what business requirement is compromised? Most of the time, the answer is, “I don’t know.” If you don’t know the value of the test, then the test theoretically could have zero business value. The test does have a cost: maintenance, computing time, administration, and so forth. That means the test could have net negative value. That is the fourth category of tests to remove. (p. 11)

Interesting rule of thumb for which unit tests are actually valuable:

… some systems have key algorithms — like network routing algorithms — that are testable against a single API. There is a formal oracle for deriving the tests for such APIs, as I said above. So those unit tests have value. (p. 11)

So it would seem that as long as you yourself don’t define what the outcome of the unit test should be, such a test would be valid.

Part 1.6

This part talks a bit more on the impossibility of exhaustively testing the whole program space (which includes both program and it’s surrounding environment). It is pointed out that determining whether the program succeeds in all possible tests is equivalent to the Halting problem.

I would also put this as requirement #2 for anyone arguing in favor of unit tests: know what a Halting problem is.

Part 1.7

An interesting fact is that on average a bug is more likely to be in the test code than in the code being tested:

If we randomly seed my client’s code base — which includes the tests — with such bugs, we find that the tests will hold the code to an incorrect result more often than a genuine bug will cause the code to fail! (p. 14)

Funny as it is, this next point is also the reality:

Some people tell me that this doesn’t apply to them since they take more care in writing tests than in writing the original code. (p. 14)

To which author offers two rebuttals.

First is that if we fix a bug in the system, then we also fix the test that now fails because of it — which, of course, is rather silly:

First… [they] tell me they are able to forget the assumptions they made while coding and bring a fresh, independent set to their testing effort. Both of them have to be schizophrenic to do that
In any case, if a client reports a fault, and I hypothesize where the actual bug lies and I change it so the system behavior is now right, I can easily be led to believe that the function where I made the fix is now right. I accordingly overwrite the [test] for that function. But [… it’s] necessary to re-run all the regressions and system tests as well.

Second is why not just be more attentive when you write the code to begin with?

Second, even if it were true that the tests were higher quality than the code because of a better process or increased attentiveness, I would advise the team to improve their process so they take the smart pills when they write their code instead of when they write their tests. (p. 15)

Part 1.8

Author points out that tests are code too, and require maintenance:

The point is that code is part of your system architecture. Tests are modules. That one doesn’t deliver the tests doesn’t relieve one of the design and maintenance liabilities that come with more modules. (p. 15)

The better way is to embed assertions in the code, and leave them there:

When I look at most unit tests … they are assertions in disguise. I sprinkle [my software] with assertions that describe promises that I expect the callers of my functions to live up to, as well as promises that function makes to its clients. Those assertions evolve in the same artefact as the rest of my code… leave the assertions in the code when you ship, and to automatically file a bug report on behalf of the end user and perhaps to try to re-start the application every time an assertion fails. (p. 15–16)

The whole of unit testing is replaced in two parts:

Turn unit tests into assertions. Use them to feed your fault-tolerance architecture on high-availability systems. This solves the problem of maintaining a lot of extra software modules that assess execution and check for correct behavior; that’s one half of a unit test. The other half is the driver that executes the code: count on your stress tests, integration tests, and system tests to do that. (p. 16)

On the concern that some code may be not reachable via system tests (i.e. only reachable via unit testing):

If your testing interfaces are well-designed and can reproduce the kinds of system behaviours you see in the real world, and you find code like this that is unreachable from your system testing methodology, then…. delete the code! Seriously, reasoning about your code in light of system tests can be a great way to find dead code. That’s even more valuable than finding unneeded tests. (p. 17)

Part 1.9

Perhaps the most serious problem with unit tests is their focus on fixing bugs rather than of system-level improvement.

This is so true. Once, someone asked my team in a meeting “What does this system do?” and everyone, except me, laughed. This was because we mostly spent our time fixing bugs and not building a system.

Another point on appropriate use of unit tests:

System tests drop you almost immediately into this position of reflection. You still need the detailed information, of course, and that’s where debugging comes in. Debugging is the use of tools and devices to help isolate a bug. Debugging is not testing. It is ad-hoc and done on a bug-by-bug basis. Unit tests can be a useful debugging tool. (p. 18)

Part 1.10

Perhaps many people are used to unit testing because they don’t have to think about the problem very much. Very different from the way some people programmed:

They had a week to prepare their code and just one hour per week to run it. They had to do it right the first time. (p. 19)

Indeed, when I think of building systems, I think of hours spent thinking about interesting problems, discussing it with a team and then doing the grunt work of coding the thing. In reality — most of the time is spent coding and the interactions are far and few between, and when they do happen, people simply get frustrated with questions like “What is the purpose of this?” and “Why do we need abstraction here at all?”:

My boss Neil Haller told me years ago that debugging isn’t what you do sitting in front of your program with a debugger; it’s what you do leaning back in your chair staring at the ceiling, or discussing the bug with the team. However, many supposedly agile nerds put processes and JUnit ahead of individuals and interactions. (p. 19)

In closing author challenges us to question our assumptions:

There’s a lot of advice, but very little of it is backed either by theory, data, or even a model of why you should believe a given piece of advice. Good testing begs skepticism. Be skeptical of yourself: measure, prove, retry. (p. 20)

Conclusion

The article gives a counter argument to the much hyped and almost cliche concept of “unit testing”. It is pointed out that unit testing is far from effective to actually testing the behavior of the whole system, adds needless maintenance to the codebase, does not increase the quality of the code or the system and very often can be replaced with much better approaches.

I hope you give it a read and maybe pass it around the next developer meeting.

--

--

Andriy Drozdyuk
Andriy Drozdyuk

Written by Andriy Drozdyuk

“Ideas are everywhere but knowledge is rare”

Responses (4)