by Sandra Henry-Stocker
Hardware

Unix tip: Monitor disk arrays with sccli commands

December 4, 2008, 10:13 AM — 

Clearly one of the best features of disk arrays is that they can continue working even when a disk has failed. One of the problems, however, is that you might not notice when a disk fails, and thus, fail to replace it in a timely manner. Let's take a look at what you can do to facilitate monitoring your storage (StorEdge) arrays so that a bad disk doesn't escape your notice.

First, there are two ways to determine that a disk in a StorEdge array has failed. You might notice that an amber LED on the front of the particular drive has lit up or you can use the sccli commands to view the disks contained in your array and their status.

To start sccli, log into the server to which the array is attached and type "sccli". You should connect to the device and find yourself sitting at the sccli> prompt. To view the state of the disks on your array, type "show disks". In the display below, one of the disks is reported to be "BAD". The particular system was still running and still had one disk in "STAND-BY" mode, so the failure was not an emergency. Still, it's a good idea to ensure that all disks in the array are working properly to steer clear of failures from which your array would not be able to recover without intervention.


sccli> show disks
Ch Id Size Speed LD Status IDs Rev
----------------------------------------------------------------------------
2(3) 0 68.37GB 200MB ld0 ONLINE FUJITSU MAW3073FCSUN72G 1303
S/N 000640B0H8A7
WWNN 500000E0130AC3A0
2(3) 1 68.37GB 200MB ld0 ONLINE FUJITSU MAP3735F SUN72G 1701
S/N 000408Q088GS
WWNN 500000E01076CB30
2(3) 2 68.37GB 200MB ld0 ONLINE FUJITSU MAP3735F SUN72G 1701
S/N 000408Q08ALW
WWNN 500000E010776B60
2(3) 3 68.37GB 200MB ld0 ONLINE FUJITSU MAP3735F SUN72G 1701
S/N 000408Q089V9
WWNN 500000E0107729D0
2(3) 4 68.37GB 200MB ld0 ONLINE FUJITSU MAP3735F SUN72G 1701
S/N 000409Q08G32
WWNN 500000E01078E900
2(3) 5 N/A N/A NONE BAD FUJITSU MAP3735F SUN72G 1701
S/N 000409Q08G5Y
WWNN 500000E01078EFF0
2(3) 6 68.37GB 200MB ld1 ONLINE FUJITSU MAP3735F SUN72G 1701
S/N 000408Q08FV1
WWNN 500000E01078DBF0
2(3) 7 68.37GB 200MB ld1 ONLINE FUJITSU MAP3735F SUN72G 1701
S/N 000408Q08FNM
WWNN 500000E01078CE70
2(3) 8 68.37GB 200MB ld1 ONLINE FUJITSU MAP3735F SUN72G 1701
S/N 000408Q089C1
WWNN 500000E0107711D0
2(3) 9 68.37GB 200MB ld1 ONLINE FUJITSU MAP3735F SUN72G 1701
S/N 000409Q08G99
WWNN 500000E01078F6A0
2(3) 10 68.37GB 200MB ld1 ONLINE FUJITSU MAP3735F SUN72G 1701
S/N 000409Q08G15
WWNN 500000E01078E550
2(3) 11 68.37GB 200MB GLOBAL STAND-BY FUJITSU MAP3735F SUN72G 1701
S/N 000409Q08G06
WWNN 500000E01078E370

In this array, we can see that all the disks are 72 GB Fujitsu drives. Drives in the same chassis can be different sizes, but should all be running at the same speed.

Depending on the nature of a disk failure, your disk array could be working at reduced speed in order to compensate for the missing data.

I like it!
Comments

Bug?

It would seem that you have a bug in your script:

sccli < /tmp/$$
show disks
show ld
exit
EOF

I suppose that "/tmp/$$" should contain
show disks
show ld
exit

Otherwise, the script will fail because "/tmp/$$" doesn't exist beforehand.

May I also suggest to refrain from using bash? bash is full of bugs and usually not found on traditional UNIX systems, making it a poor choice for scripts (programs) which are meant to be, or should be portable across systems. /bin/ksh is likely to be a good choice between usability and portability.

Also, may I suggest you trap SIGINT, so that if your script is ever ^Ced during execution/debugging, the self-defined Cleanup() function will clean up "/tmp/$$" automatically.

EXAMPLE
#!/sbin/sh

Rm="/bin/rm"
Self="`basename $0`"
TmpDir="/tmp"
TmpFile0="$TmpDir/${Self}.$$"

trap Cleanup 2
Cleanup()
{
$Rm -f "$TmpFile0"
}
| reply
Free books

Essential JavaFX
Get started building rich Web apps quickly with an introduction to the power of JavaFX key features -- scene node graphs, nodes as components, the coordinate system, layout options, colors and gradients, custom classes with inheritance, animation, binding, and event handlers.Enter now!

The Nomadic Developer
Consulting can be hugely rewarding, but it's easy to fail if you are unprepared. To succeed, you need a mentor who knows the lay of the land. Aaron Erickson is your mentor, and this is your guidebook. Enter now!

Featured Sponsor

AISO founders envisioned a Web hosting company that was environmentally friendly. While the company employed energy-efficient innovations like solar panels, its infrastructure produced unacceptable power and cooling requirements. Find out how AISO leveraged AMD technology to overcome their challenge in this case study white paper.

In this whitepaper, Scalar explores the opportunity to change the landscape with respect to mission critical databases built around Oracle. Leveraging technologies such as Linux, high-end commodity processing power and Oracle RAC technology to architect, design, build and maintain database infrastructure that delivers maximum availability, reliability and performance at a fraction of traditional cost.

On a typical day, weather.com, the Web site for The Weather Channel in Atlanta, serves up between 15 million and 20 million page views. But in September 2004, when back-to-back hurricanes ransacked Florida, the peak traffic on one day more than tripled: over 70 million page views by more than 7 million unique visitors. Read the full success story now.

Marketplace