Why Don't We All Have Diagnostics Plugs?

I recommend that you take a minute or so to read “The Diagnostics Plug” by Michael Mahlberg. Including optional diagnostic information in our software systems seems so obviously valuable, but few people do it. Why?

It requires one of two things that are not easily done: either
- deciding now what’s important to measure and report, or
  - How can we know that now? Since we can’t know it all, either we err on the side of trying to figure it all out (we never finish) or giving up before we start (we do nothing).
- making it easy to measure and report “anything” when we choose to do it.
  - We have other, more urgent things to do first.
    - Of course, in doing those things, we cut the very corners that could make it easier to make it easy to measure and report “anything” later.

What do we get instead?

No useful diagnostic information at all, or
Simple, obvious diagnostic information that stops finding problems over time, or
Total diagnostic information overload which trains us to stop listening to it, or
The needle/haystack problem, where we have only two choices: turn the diagnostics off and defeat the purpose or turn them on but never find anything useful in and among the noise.

What could we do about this? We could write smaller things that we find easier to reason about; connect them loosely and make them swappable; extract reusable parts and reuse them so that we have more uniform (and therefore trustworthy) infrastructure.

How do we do it?

Writing small, focused tests encourages the programmer to negotiate and understand the boundaries between things.
Writing small, focused tests encourages the programmer to become annoyed by too many unrelated responsibilities in one place. This encourages the programmer to separate infrastructure behavior from business behavior.
Sensitivity to duplicate code encourages the programmer to notice patterns of infrastructure use and extract them from the business-related functions.
Sensitivity to duplicate code encourages the programmer to package and reuse infrastructure behavior, making it easier to trust and more obvious in the ways that it fails.
Sensitivity to the names of things encourages the programmer to become annoyed by too many unrelated responsibilities in one place as well as by related responsibilities spread too far apart throughout the design. This encourages the programmer to separate infrastructure behavior form business behavior and collect related infrastructure behavior into more easily-reused packages.

There’s more, but this will do.

If we did these things, then what would we get? Clear contracts between components. What would that give us?

Easier to detect failures, because we know more precisely what success and failure mean.
Easier to report failures clearly and to understand what they mean, because each component has a clearer and narrower purpose.
Easier to spot patterns of infrastructure failure, because of higher cohesion in infrastructure behavior.
Easier to distinguish infrastructure failures (file, database, network) from business failures (policy infractions, inconsistent data/conclusions, incorrect data/conclusions).
Infrastructure upgrades more likely to be effective (due to higher scalability) and less likely to be risky (due to greater precision of expectations and more thorough checking and testing).
Easier to fix problems.
- Easier to isolate failures, because components are more cohesive, so failures are less likely to cross boundaries.
- Easier to understand failures, because we more precisely understand the stated/intended contracts of components.
- Easier to identify failures, because we can express them in terms of either incorrect or missing parts of the contract for one or more components.
- Easier to decide how to fix failures, because we can express the desired behavior in terms of changing or adding to the contract for one or more components.
- Easier to fix failures confidently, because we can express our expectations more precisely as contracts and can either check the contracts directly or check with examples (tests).
- Less risky to fix failures, because higher cohesion makes it less likely that changes will have unforeseen side effects.

These consequences compound their effects over time. Cost savings accelerate. Support costs no longer grow with the size of the code base. Capacity to add features remains relatively constant, while cost to add features increases much less slowly than usual.

References

Michael Mahlberg, “The Diagnostics Plug – a missing abstraction in most systems”.

J. B. Rainsberger, “The Four Elements of Simple Design”. Learn more about the principles underlying this article’s “How do we do it?” section.

Corey Haines, Understanding the Four Elements of Simple Design. Examples of the results of applying these fundamental principles of design.