December 02, 2012, 7:26 PM — Effective troubleshooting requires that you know how to break a problem into pieces, track down evidence, and that you understand your systems and applications -- as well as the tools at your disposal -- well enough to analyze problems when they rear their ugly heads. You can learn these skills over decades of working with Linux systems or you can jumpstart the process by reading a book which provides you with someone else's insights -- or both!
DevOps Troubleshooting: Linux Server Best Practices provides a lot of practical insights and tricks to help you get up to speed as a competent Linux troubleshooter. The "DevOps" part of this title refers to systems administrators working together with quality assurance engineers and developers -- a collaboration that can bring additional insights to bear on a wide range of problems. The approach that this multidisciplinary team takes involves a lot of sharing and communication and can make quick work of even complicated problems. But even if you're completely on your own, the techniques and suggestions in this book can help you resolve problems more quickly and effectively and might even get you thinking about what you can do NOW that will come in handy when a problem arises.
To be good at troubleshooting, you first need an overall approach to problem solving. You are likely, for example, to ask yourself questions like "When did this last work properly?" or "What has changed recently?". You might even compare a system or application that isn't functioning properly with another that isn't exhibiting the problem. Why does one work while the other keeps crashing? The book covers many approaches like these in its "Troubleshooting Best Practices" chapter.
Insights provided in this chapter can help you to characterize the nature of a failure. Does it happen all the time or just once in a while? Is it reproducible or completely random? What, if any, error messages or log entries are in evidence that might help you gain insights into what is going wrong?
The author also warns us to not be too quick to reboot. You might erase evidence that you need and maybe never understand what went wrong, leaving yourself vulnerable to a likely recurrence and nothing to show for you efforts.
Troubleshooting is an acquired skill.