How Test-Driven Development Works (And More!)

It surprises me, from time to time, how much I still need to justify test-driven development to prospects and would-be course attendees. Many feel that TDD has crossed the chasm, while others still see TDD as a cultish practice worth marginalizing. I take some blame for those who find TDD cultish, because until now I haven’t had a strong, sensible, theoretical basis to justify TDD as an idea. I could do no better than “it works for me” or “my friends like it”. That has changed since I’ve started giving my talk “Introduction to Agile with the Theory of Constraints” in which I use concepts from Theory of Constraints to motivate the practices of agile software development, notably those of extreme programming. If you buy in to ideas from Theory of Constraints or Lean Manufacturing, then I think I now have a stronger argument to justify the core programming practices in extreme programming in particular and agile software development in general. I don’t even need all of the Theory of Constraints but rather a simple appeal to fundamental concepts in Queuing Theory.

Queuing Theory?

Yes, Queuing Theory. (And I don’t plan to capitalize that any longer.) I don’t proclaim to have any particular expertise in this area, but I have already seen how to use queuing theory ideas in optimizing network-based systems, and I see no reason we couldn’t extend that to software delivery systems. Better, I only need to appeal to a single idea from queuing theory to make my point.

Given a process B, which follows a process A, sometimes in performing B we need to perform some of A again. We can remove the need to rework by taking some portion of process B and performing it before process A¹.

This merits a diagram. If we have this problem

then we can solve it by doing this

and the resulting system will work more efficiently by removing wasteful rework. I assume here that we derive no significant benefit from the rework itself, which I suppose I must justify, but let’s not ruin a good story with the truth. Here I’ve described the general problem, and by applying it to software development, I can… well, I find it more effective if I save the punchline for the end.

Winston Royce, 1970, revisited

I imagine you know this diagram

and appreciate that Royce wrote in his now infamous paper that this single-phase waterfall is risky and invites failure. If you don’t appreciate that, then I cannot strongly recommend enough your reading the original paper in its entirety, rather than stopping after page 2 as most people have done².

We can apply the queuing theory result I’ve just cited to this diagram and generate some interesting conclusions. I’ll start by focusing in on this portion of the system

We write code, then we test it. Sadly, we occasionally find a bug³ which makes us change the code we wrote after we thought we’d finished it. That makes a loop of the type we can unravel with our queuing theory result.

Since “coding” is process A and “testing” is process B, we need to do some testing before we start coding.

It doesn’t take long for this to become a virtuous loop where we write only the code we need to write in order to pass the tests we write.

I use the term test-first programming to describe this cycle⁴. When we practise test-first programming, we design as much detail as we can before writing the first test, then use the tests to help us type in our implementation correctly. Most teams most of the time can use test-first programming to reduce their ~~defect~~ mistake count to near zero, which increases their productivity and improves their ability to deliver, by helping them waste less time agonizing over whether to fix mistakes late in a release. I started this way in 2000 when I first discovered JUnit and stopped making silly mistakes in the code I wrote, which I found significantly beneficial in helping me code more confidently. I still designed most of what I built mostly up front.

After a while, though, I recognized a new process loop: I found some parts of my design difficult to test, or I found some parts of my design didn’t fit together when I tried to type them in.

Returning to our queuing theory result, since “designing” is process A and “doing test-first programming” is process B, we need to do some test-first programming before we start designing.

It doesn’t take long for this to become a virtuous loop where we check our design ideas as we think of them and implement only the parts of the design we can justify needing. When we include refactoring in our practice, we can confidently “under-design” compared to the level of design we expect to need by the end of a task, which I believe amounts to designing appropriately for the code we need to implement right now. This virtuous loop combines test-first programming and evolutionary design, including guiding principles like “you aren’t gonna need it” and the four elements of simple design into test-driven development, where we check our implementation by running tests and we check our design ideas by writing tests.

Where test-first programming helps most teams most of the time reduce their mistake count to near zero, test-driven development helps them reduce their design inventory—mostly code that gets in our way because it doesn’t actively help us deliver a feature—to near zero. This further increases productivity and improves their ability to deliver by helping them waste less time agonizing over design problems they find costly to fix. I waited until I’d spent an entire release practising test-first programming before doing more test-driven development. My transition consisted of trying to do less and less up-front design for each task, letting myself feel comfortable with each new step. Within two years I estimate I designed about 5% as much up front as I did before I started practising test-first programming. I can’t measure the corresponding improvement in my design, but I look back at projects that took 3 months before I practised test-driven development that I now feel confident I could complete—truly complete—in one week. Of course, we can’t stop here!

Enter our friend analysis. To simplify the discussion, I will treat analysis as “discovering the features we want in our software” without forcing myself to state too precisely how that happens⁵. Once again, we have our familiar situation.

Once again, we face the situation where in the process of implementing features we discover new features we need, current features we don’t need, and learn new things about features we know we need to build. This adds to our analysis, meaning that we should try test-driving some features before we try to implement others.

It doesn’t take long for this to become a virtuous loop in which our desire to implement (and deliver!) features drives them ever smaller, as we extract more concentrated value out of each one⁶. When we implement feature 12 we learn something about features 23, 30 and 52. We might decide not to deliver feature 30 any more. We might decide to expand feature 23 to encompass a few more key cases. We might decide to rush feature 52 to the top of the pile. Most teams most of the time find that this cycle helps them reduce the number of rarely- or infrequently-used features in their system⁷. This yet again increases productivity and improves their ability to deliver meaningful software to their stakeholders by eliminating the time wasted on delivering too much of a feature too soon, the time wasted on entire features we thought we needed but realized we don’t, and the time wasted arguing about what a feature means, rather than writing examples together: business-oriented tests that describe how a feature works in enough detail for the business and technical project community to agree on the conditions of satisfaction for delivering the feature.

I call this behavior-driven development, and refuse to spell it with the u that provides as much value to the word as your appendix does to your body French u, since I’m writing in English⁸.

Once again, I didn’t coin the phrase, and some might argue against the way I use it, but I find it apt. This cycle include practices like business and technical people writing examples together, feature injection, feature splitting, and value-based (rather than cost-based) planning.

At this point, I think I’ve done my job. I believe I’ve justified not only test-first programming or test-driven development, but full-on behavior-driven development, using only a single result from fundamental queuing theory. I’ve made only a single assumption—that we agree on the appropriateness of applying queuing theory to a software development system. I’ve tried to add as little as possible to my reasoning in order to keep it as context-free as possible. As a result I claim that most teams most of the time will benefit from moving along the path from code-and-fix to test-first programming to test-driven development to behavior-driven development.

Now, for homework, what happens when we consider these processes?

Surely at least one you’ve needed to deliver more features for software you’d already deployed. How well does that work? What problems do you encounter? What if you applied our new favorite queuing theory result to that rework loop?

¹ I really need a citation for this, and when I find it, I will place it here.

² I digress, but I really can’t help myself on that one.

³ Also known as defect or, for the truly congruent, mistake.

⁴ Clearly I didn’t coin the phrase, but I know many people who treat “test-driven development” as a simple renaming of “test-first programming”, and I believe making a stronger distinction adds real value to the conversation.

⁵ I don’t think “gathering requirements”, as though we could pick them like berries, fits as a metaphor. I like “trawling for requirements”, which I believe I first read in Mike Cohn’s User Stories Applied.

⁶ We can easily apply the “Pareto Distribution” here in that we can deliver 80% of the value from implementing 20% of the feature.

⁷ You recall that Jim Johnson of the Standish Group reported in 1994 that 45% of developed features are “never used”. As I recall, only 7% of features were used very frequently.

⁸ My Canadian and British brethren and sistren be damned. I assert my right as a Canadian to choose the British spelling when I prefer it and the American spelling when it saves me time.