Unix Tip: Core dumps for dummies
If once every several years, you have problems with a process and end up with a troublesome core dump, you might be able to extract some useful information from the file before you delete it from your system. In this week's column, we look at an extremely simple script intended to help you analyze Solaris core dumps without requiring you to remember the relevant commands.
I call this script "anacore" for "analyze core dumps". Using a select command to create a menu, it offers the user several commands to get started with his core dump analysis -- adb (a debugger), pstack (which displays a useful trace of the sequence of calls that led up to the core dump and pflags (which shows information about the process' threads. In case the user isn't sure which one of the commands he wants to try, the script also offers a help command to list brief descriptions of each of the commands.
#!/bin/bash
if [ $# == 1 ]; then
file=$1
else
file="core"
fi
echo "Select your choice by number"
select cmd in adb pstack pflags help;
do
break;
done
if [ "$cmd" == "" ]; then
echo "You need to select a valid option (by number)"
exit 1
fi
if [ "$cmd" == "help" ]; then
echo "adb -- start debugger (^D to exit)"
echo "pstack -- show traceback"
echo "pflags -- show info on threads"
else
$cmd $file
fi
|
Now, let's see the script in action. In the output shown below, the user wants a quick analysis of the core dump (i.e., what process created it):
# ./anacore Select your choice by number 1) file 2) adb 3) pstack 4) pflags 5) help #? 1 core: ELF 32-bit MSB core file SPARC Version 1, from 'DD2PROC' |
Next, he wants to get a little more information about how the core dump came about.
# ./anacore Select your choice by number 1) file 2) adb 3) pstack 4) pflags 5) help #? 3 |
The user has just selected the pstack command. This command is part of the base operating system and is the recommended way to get tracebacks (or "backtraces") on Solaris systems, especially if dbx tool is not available. The pstack command is able to display function arguments for programs built without symbolic information. An example of pstack output is shown below:
core 'core' of 18317: DD2PROC fed33464 strlen (dbf644, ffbffd16, 0, a, fed54584, ffbf4f77) + 80 fed545a4 getzname (ffbffd16, 1214c0, 0, 0, 43, ff00) + 50 fed540d0 _ltzset_u (45b63efd, 68c8c, 68ac0, fedbfb88, 1, feda78a6) + 74 fed53384 localtime_u (ffbf57ec, fedc286c, 3, 7efefeff, 849200, ff00) + 14 ff2dd4a0 __1cMCRformatDate6Flpkcpc_c_ (45b63efd, 849209, ff36e760, 0, 0, 3) + 238 ff2dd124 __1cMCRformatTime6Flpc_v_ (45b63efd, ff36e760, 0, ba74, 0, 0) + b4 ff2d8b0c __1cKCRspoolMsgHgetTime6kM_pkc_ (884fd8, ff3364bb, fedc02a4, 0, fedc2ef4, fedc02a4) + 3c ff2d8b70 __1cKCRspoolMsgIgetOMhdr6Mpci_v_ (884fd8, ffbf5ba4, f423f, ffbf5c04, 81010100, 0) + 30 ff2d9c90 __1cKCRspoolMsgJsendToSop6MppnICRomDest_ii_v_ (884fd8, ffbf5ba4, 1, f423f, 2, ff33640a) + 60 ff2d9c04 __1cKCRspoolMsgJsendToSop6MppnICRomDest_i_v_ (884fd8, ffbf5cd0, 1, 2, 493e0, 1) + 24 ff2d9bb8 __1cKCRspoolMsgJsendToSop6MpnICRomDest_2222_v_ (884fd8, ffbf5d9c, 0, 0, 0, 0) + 118 ff2d685c __1cKCRspoolMsgHprmSend6M_h_ (884fd8, 88513c, 54450000, 7efefeff, 81010100, ff00) + 13c ff2db21c __1cKCRspoolMsgIsendSegs6Mkic_v_ (884fd8, 0, 0, 7efefeff, 81010100, 31) + 374 ff2d845c __1cKCRspoolMsgFspool6M_v_ (884fd8, 885064, ffffffff, 20, 852590, 885024) + 44 ff2d66f8 __1cKCRspoolMsgIprmSpool6M_v_ (884fd8, ffffffff, ffbfe0a0, 1, ff06a014, ffffffff) + 58 ff2d636c __1cFCRmsgIprmSpool6MCpkcE_v_ (846058, 8, ff34123c, feb03a44, 0, ff341585) + 94 ff30d9e0 _in_progress (ff341585, ffbfe2cc, 7, 64, 0, 0) + 310 ff30fffc __1cH_insync6F_v_ (0, 0, 0, ffbfe1f9, d, d) + b74 ff30bff0 __1cGINmain6Fhppc_v_ (1, ffbfe2cc, 15ba8, ffbfff8d, ff06a6d0, 43f8) + 3e0 ff30c920 main (1, ffbfe2cc, ffbfe2d4, c9400, 0, 0) + 18 000208b8 _start (0, 0, 0, 0, 0, 0) + 108 |
The particular process in this example core dumped when the strlen function was called.
Now, let's have the user select pflags.
# ./anacore
Select your choice by number
1) file
2) adb
3) pstack
4) pflags
5) help
#? 4
core 'core' of 18317: DD2PROC
data model = _ILP32
/1: flags = PR_PCINVAL
sigmask = 0xffffbefc,0x00003fff cursig = SIGSEGV
|
Here, we can see that there was a segmentation fault, generally associated with improper memory handling.
Selecting option 1, adb, starts the debugger. Again, we see evidence of the segmentation violation and the program that caused it:
# ./anacore Select your choice by number 1) file 2) adb 3) pstack 4) pflags 5) help #? 2 adb core file = core -- program `` /data/build/loads/cc/dd2proc/cproc/EE/diams ch'' on platform SUNW,UltraAX-i2 SIGSEGV: Segmentation Fault |
If the user wants help, he selects option 4
# ./anacore Select your choice by number 1) file 2) adb 3) pstack 4) pflags 5) help #? 5 file -- describe core file adb -- start debugger (^D to exit) pstack -- show traceback pflags -- show info on threads |
The script doesn't help with the debugger past starting it and, in the help output, telling the user how to exit.
Clearly, analyzing the cause of a core dump is considerably more complicated than this simple script can handle. On the other hand, for those of us whose reaction to most core dumps is simply to remove them from our systems immediately, a quick analysis of "what" caused the core dump and "why" might turn out to be useful, especially if the same problems occur again and again.
If you rarely look at core dumps and want to get some quick feedback when you run into one without having to remember the commands, this might be useful.