In 2007 I did a one year stint providing technical support to our clients. During that time I put together a one page guide on support and troubleshooting to help other people. A little later that year, I spent time training in Kepner-Tregoe’s KT Resolve process. KT Resolve was a lot more in depth than what I came up with but similar in a couple of ways. I found KT Resolve to be a thorough process but too formal and heavy for most situations I encountered.
Although I’m familiar with some basic ITIL ideas, I haven’t really had any in-depth exposure. Much of my thought on the topics of support and troubleshooting is homegrown.
Being a further down the road at this point in my career, my ideas on both support and troubleshooting are a bit different. For now though, I’ll just post what I came up with in 2007 and see what other people have to say. I did touch this up a bit but it’s largely the same as when I wrote it.
1. Define the Problem
- Is it a problem or an information request?
- What is the difference between what is happening and what “should” be happening?
- When did the problem start?
- What are the exact symptoms?
- What steps led up to the problem?
- What steps were tried after the problem occurred?
- Is this random or repeatable?
- Did this happen just once or does it happen sometimes or all the time?
- Is it happening for just one user, some users or all users?
- Is it happening on just one machine, multiple machines or all machines?
- Is this happening in just one location, multiple locations or all locations?
- Get context: environment, systems, versions, compatibilities, people, processes, etc.
2. Evaluate Impact and Urgency
- How many people are affected?
- To what degree are people affected?
- When does this need to be addressed?
- What happens if this is not addressed at all?
3. Identify the Cause
- Has this been seen before? Any low effort & risk solutions worth trying?
- Is it necessary and/or possible to even identify a cause? Or should effort just be focused on achieving the desired state?
- What are all the possible problem sources? Think people, processes, machines, systems, environments, etc.
- What is the scope of each problem source? One person, some people, all people, processes, machines, etc?
- Have there been changes in any possible problem sources recently?
- What other kind of problems is the client experiencing in their environment?
- Is this really a problem or is there a gap between what exists and what is expected?
- Is something trying to be done that wasn’t designed for in the current state?
- Does the current state even need to be modified or should user/consumer expectation be modified?
- Is a process wrong? Are they trying to do something that they maybe don’t need to do at all?
- Start from very broad focus and work to bisect search area repeatedly to focus in on root cause. Use “Why” questions to drill down on specific items from your problem definition. Trace the chain of causality.
- Develop a primary hypothesis and a secondary hypothesis to force you to think about the problem from different angles.
4. Develop the Solution
- Is there already a solution waiting to be implemented?
- What are the pros and cons of the solution?
- Does the solution need to clean up issues resulting from the original problem?
- What resources are needed to implement the solution?
- What prevents you from implementing your solution? Is it feasible?
- Never stop at the first solution. Think about a second solution too.
- What is Plan B if Plan A fails?
- Will the solution impact another area of the system or another system in the environment?
5. Implement the Solution and Clean Up
- Implement the solution and test.
- Document the solution.
As you work through the process, remember to
- Always question the assumptions you are making at each step.
- Question all information given to you and make sure it is as clear, accurate and objective as possible. If somebody provides you with information, restate it to them and have them verify it.
- Always communicate throughout with the parties involved. Get others involved to make sure you’re not overlooking things or making assumptions.
- Be aware of framing the problem incorrectly. Are you framing a people/user problem as a technical problem? Are you framing a timeframe/workflow problem as a user problem?
- Be wary of disconnects between the solution/desired state and the problem/current state. Always ask: is this solution really addressing the problem?
- Quantify things. If somebody claims something is slow, ask them what they mean by slow. If they say performance is poor, have them give you a specific example of what they expect and what is currently occurring.
Technical support isn’t just troubleshooting. A big part is both understanding and managing the expectations of your team, your management and your client.
As far as troubleshooting itself, some people seem to have a natural ability for it and others don’t. That being said, I’m convinced anybody can develop strong troubleshooting skills.