You need to learn how to quickly characterize a problem, how to track down clues about its nature, how to simplify a problem or break it into manageable pieces, and how to bring the proper tools and maybe even the proper colleagues to bear.
This book gives a quick but effective introduction to the art of troubleshooting and then hones in on some of the most common problem areas:
- slow systems, RAM shortages, taxed CPUs, excessive disk I/O
- booting problems
- full or corrupt disks
- network problems
- name resolution problems
- email problems
- web site problems
- slow database problems
- faulty hardware
Each of these chapters provides suggestions on how to approach the particular problem. In the chapter on tracking down web site problems, for example, questions like "Is the server running?", "Is the remote port open?" walk you through the logical steps as you close in on what is wrong. Suggestions like "test the remote port locally" and tips on how to check your firewall rules guide you as you close in on what's wrong.
The chapter then offers suggestions for testing from the command line, using tools such as curl and telnet. It explains HTTP return codes so that you can understand what you are seeing in your web logs.
It also suggests that, if you can get your hands on some server stats, you might be able to tell whether the server is overwhelmed or barely moving and shows you how to run some simple tests on your Apache configuration file, how to spot permissions problems, and how to recognize sluggish or unavailable servers.
One of the things that leaves so many of us unprepared for problems is that we don't pay enough attention to how a system behaves when it's working properly to pick out what is different when it is not. In keeping with this, one of the take home messages of this book is that we should all be careful to document problems and their solutions. How many times have you encountered a problem that looks familiar, but not been able to recover anything from past incidents that could help you this time around? If you document the problem's symptoms, its root cause, and the steps that resolved it, you may not be haunted by a deja vu!
DevOps Troubleshooting: Linux Server Best Practices is an extremely helpful book and one that any member of a DevOps team or Linux administrator should read.