Using prtdiag to troubleshoot system problems

By Sandra Henry-Stocker  1 comment

The prtdiag command on Solaris systems is both a script and an executable. The script, /usr/sbin/prtdiag, does a little fact checking -- such as whether your "uname -i" command yields a proper response; it should be your platform designation (e.g., SUNW,Sun-Fire-V240) and then runs the "real" prtdiag from its /usr/platform location. On a Sun Fire V240, for example, that location should be /usr/platform/SUNW,Sun-Fire-V240/sbin/prtdiag. The /usr/platform/`uname -i`/sbin/prtdiag command should work on any system. The command at this location is the binary that collects the information that prtdiag displays.

Even without its verbose (-v) option, prtdiag provides a lot of information on your system's components, including a status indicators such as "okay" and "online" for various system components. The output shown below is a portion of the prtdiag output, showing the status of I/O devices. It took me a while to realize that "MB" represents the motherboard.


================================= IO Devices =================================
Bus Freq Slot + Name +
Type MHz Status Path Model
------ ---- ---------- ---------------------------- --------------------
pci 66 MB pci108e,1648 (network)
okay /pci@1f,700000/network@2

pci 66 MB pci108e,1648 (network)
okay /pci@1f,700000/network@2,1

pci 33 MB isa/su (serial)
okay /pci@1e,600000/isa@7/serial@0,3f8

pci 33 MB isa/su (serial)
okay /pci@1e,600000/isa@7/serial@0,2e8

pci 33 MB pci10b9,5229 (ide)
okay /pci@1e,600000/ide@d

pci 66 MB scsi-pci1000,21 (scsi-2)
okay /pci@1c,600000/scsi@2
...

The following portion of the prtdiag output illustrates the memory layout on the box. The system being interrogated has 2 GB of memory in four 512 MB DIMMs.


============================ Memory Configuration ============================
Segment Table:
-----------------------------------------------------------------------
Base Address Size Interleave Factor Contains
-----------------------------------------------------------------------
0x0 2GB 4 BankIDs 0,1,2,3

Bank Table:
-----------------------------------------------------------
Physical Location
ID ControllerID GroupID Size Interleave Way
-----------------------------------------------------------
0 0 0 512MB 0,1,2,3
1 0 1 512MB
2 0 1 512MB
3 0 0 512MB

With the verbose (-v) option, prtdiag will produce roughly twice as much output. For example, the verbose option adds fan speeds plus temperature, voltage and current sensors to the display. The state of system LEDs, for example, will be displayed in a table such as the one shown below. As you can see, each LED has a state (ON or OFF) and a color. Amber LEDs are meant to indicate problems, so an LED which shows "ON" and "amber" would indicate a problem to be looked into.


--------------------------------------------------
Led State:
--------------------------------------------------
Location Led State Color
--------------------------------------------------
- fault OFF amber
- power ON green

- locator OFF white
- top_access OFF amber
- alarm1 OFF amber
- alarm2 OFF amber
- system ON green
- supplyA ON green
- supplyB ON green
DISK0 fault OFF amber
DISK0 power ON green
DISK0 ok_to_remove OFF blue
...

The prtdiag output might display the ON/OFF in either uppercase or lowercase, so you could compose a quick check of your LEDs like this:


# prtdiag -v | grep -i on | grep amber
MB SERVICE ON amber

In this example, we see that an amber LED on the motherboard is lit, indicating a fault of some kind.

While the prtdiag command is extremely ueful, the prtdiag man page is woefully inadequate. It barely explains the two available command options and doesn't provide any detail on how you should interpret thecommand's output -- not even to explain what the acronyms -- like "HDD0" and "PS0" -- represent. To dispel the mystery of prtdiag's output, here's a list of theacronyms I've decoded to date ("?" representing an integer unit number, such as in "HDD0"):


B? bus on I/O assembly
C? I/O card in the I/O assembly
F? fan
FT? fan tray
HDD? disk drive
IB? I/O assembly (slot)
MB? motherboard
P? port on I/O assembly
PCI? PCI board
PS? power supply
RP? repeater board
SB? CPU/memory board slot (system board)
SSC? system config card

1 comment

    Anonymous 3 years ago
    Be careful if you use Avocent KVMs with USB keyboard dongles to manage your Sun machines. The prtdiag command tries to query the USB ports and gets an invalid response back when the Avocent dongle is connected. Needless to say, prtdiag dies when this happens and you get no information on your disk & power supply health. If you disconnect the Avocent dongle, prtdiag reports information correctly.

      Add a comment

      Post a comment using one of these accounts
      Or join now
      At least 6 characters

      Note: Comment will appear soon after you have activated your account.
      Obscene/spam comments will be removed and accounts suspended.
      The information you submit is subject to our Privacy Policy and Terms of Service.

      ITworld LIVE

      Operating SystemsWhite Papers & Webcasts

      White Paper

      Microsoft Enterprise Agreement Program Overview

      Discover how flexible the Microsoft Enterprise Agreement Program is to help you build the right software solution agreement for your business. This paper highlights all the available options-from on-premise software and cloud service solutions, to payment options and enrollment programs, and more.

      White Paper

      Watson - A System Designed for Answers. The future of workload optimized systems design

      Watson is a workload optimized system designed for complex analytics, made possible by integrating massively parallel POWER7 processors and DeepQA technology. Read the white paper about Watson's workload optimized system design.

      See more White Papers | Webcasts

      Ask a question

      Ask a Question