My primary training in troubleshooting (and cursing) came at my father’s side, hovering over the engine of every car my parents ever owned, accompanied by his endless appeals to “point the light over here dammit”. That training (more the troubleshooting than the cursing, but ok, both) has served me well in working with technology and systems over the years.
A while back, I shared some thoughts on troubleshooting. For our purposes here, let’s boil it down to three basic parts:
- Define the Problem – Make sure you have clearly identified and defined the problem and its impact.
- Identify the Cause – The process of figuring out where the problem lies (assuming that it is not an issue of misunderstanding or unmet expectations).
- Remediate – How to fix the problem and cleanup any fallout.
For the longest time, I found myself grasping for an abstract way to talk about #2 when training others in troubleshooting. A few years ago I found something useful in the book Simplicity, by Edward de Bono.
Imagine that somebody hands you a blank piece of paper and a pencil. They have chosen an invisible spot on that piece of paper and it’s your job to find it. You can ask them questions to which they will strictly answer Yes or No.
How do you proceed to find the spot?
Very sensibly you draw a line dividing the paper into two parts, A and B. You ask if the spot is in A. If it is not in A then it must surely be in B. There is nowhere else it could be. The alternatives of A and B cover all possibilities. Then we proceed to divide B into C and D. And so on. In the end we must find the point because at every moment we have covered all alternatives.
—pp. 122
What he is describing is essentially a binary search (also called a binary chop, hence the title of this post). And while it is an imperfect representation of how you can identify root cause when troubleshooting, it is nevertheless a useful one. It helps you consider scope, how to proceed methodically, and how to rule out global problems as you narrow down the focus of your investigation to find the cause.
It’s important to acknowledge once more that this is an imperfect representation.
You often have a lot more information to work with that will point you towards the likely cause. If you have some confidence you know the cause based on the information at hand, it may be smart to shortcut the process outlined above, provided the risk is low enough. In the example with the blank paper and the invisible spot, if you have a strong reason to believe you know where the spot is and there is nothing to lose by asking if it’s there, take your shot and ask.
Additionally, de Bono provides some guidelines around when this kind of thinking is useful:
…in some situations there are a limited number of alternatives. It is perfectly true that in some situations analysis can reveal the fixed number of alternatives in such closed situations.
—pp. 122
Where it is a matter of search and where the “no” answer has a real value (by closing a possibility) the process is valuable.
—pp. 123
I’ve found that information technology systems (and my parents’ cars) fall within those guidelines, and his example with the paper and the invisible spot to be a useful idea in helping others think about troubleshooting and getting at root cause.