Digesting log data

Be the first to comment | 2I like it!
April 22, 2009, 08:26 AM —  ITworld — 

Reducing voluminous log data to a size that can be read and understood in a matter of minutes can make the difference between systems administrators having the time to review log data on a routine basis and only reviewing it when a problem has become so noticeable that an analysis is unavoidable. 

I insist on two criteria when digesting log data. The first is that no messages are omitted. If I only look for messages that I know to be potential problems (like those that include the word "warning"), I may easily overlook many other problems of an immediate or emerging importance. The second is to include a count of how many times each message has appeared. This gives me a sense of the severity of each problem.

Though I've created a script to digest log data at various times in my career and using a variety of tools, my most recent attempt in Perl has some advantages. One advantage is that it works where similarly constructed shell scripts fail for lack of resources. Another advantage is that the code itself is surprisingly simple.

The reason I turned to Perl is easy to explain. When I attempted to digest a particularly large log file on the command line using standard Unix utilities, my system balked with a complaint that no space was left on the device - despite the fact that I used the most terse and lightweight command that I could conjure. I had sorted the file and passed it to the uniq command with a -c argument intended to give me the number of times each pattern occurred. This is what I got:

$ sort logfile | uniq -c
sort: write error while sorting: No space left on device

While this modest little Unix command will work for most files most of the time, my file was more than 800,000 lines long. When I replaced this command with a Perl script, I had my results (on repeated runnings) in anywhere from 12 to 20 seconds. A typical messages file takes only a few seconds.

The primary "trick" to this Perl script is making good use of arrays.

The first thing we do in this script is to check for the existence of the log file name on the command line and assign the name provided to a variable.

#!/bin/perl

if ( $#ARGV != 0 ) {
print "usage: $0 \n"; exit } $logf=$ARGV[0];

In the following lines, we read the log file into an array. We then change all occurrences of digits to single pound signs to reduce the uniqueness of our data and increase the level of compression. This would reduce dates and times, for example, to strings that all look the same (e.g., "12: Nov # #:#:# boson su: 'su root' failed for demian on /dev/pts/#). We also count up how many times each of the particular patterns appears. For this part of the process, we use an associative array - an array for which the index is a string value rather than a simple numeric sequence. At the end of this section, we have a single array element for each message type. The index is the string itself and the value the count.

Sign up for ITworld's Daily newsletter
Follow ITworld on Twitter @IT_world

I like it!
Close

On Twitter now

unix

Powered by Twitter
You are logged in | Sign out
Sign in and post to Twitter

What are you thinking?

Cancel Tweet sent

On Twitter now

Post a comment
The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
peer-to-peer

Esther Schindler
If the comments are ugly, the code is ugly

claird
SVG a graphics format for 21st century

pasmith
Take Chrome OS for a test spin

Sandra Henry-Stocker
Solaris Tip: Have Your Files Changed Since Installation?

sjvn
64-bits of protection?

jfruh
Android fragments vs. the iPhone monolith

mikelgan
What Gizmodo missed about the Pro WX Wireless USB disk drive

 

Sidekick: The Good News & the Bad News
Either way you look at it Microsoft Data Center management did not follow standards or best practices in this failure. In which case it makes me wonder more about the outsourcing of corporate data much less personal data.
- mburton325

Join the conversation here

The Daily Tip

The Daily TipQuick, practical advice for IT pros. Made fresh daily.

Hot tips:

Want to cash in on your IT savvy? Send your tip to tips@itworld.com. If we post it, we'll send you a $25 Amazon e-gift card.

Newsletters

Subscribe to ITWORLD TODAY and receive the latest IT news and analysis.

I would like to receive offers via email from ITworld partners.
By clicking submit you agree to the terms and conditions outlined in ITworld's privacy policy.
Featured Sponsor

AISO founders envisioned a Web hosting company that was environmentally friendly. While the company employed energy-efficient innovations like solar panels, its infrastructure produced unacceptable power and cooling requirements. Find out how AISO leveraged AMD technology to overcome their challenge in this case study white paper.

In this whitepaper, Scalar explores the opportunity to change the landscape with respect to mission critical databases built around Oracle. Leveraging technologies such as Linux, high-end commodity processing power and Oracle RAC technology to architect, design, build and maintain database infrastructure that delivers maximum availability, reliability and performance at a fraction of traditional cost.

On a typical day, weather.com, the Web site for The Weather Channel in Atlanta, serves up between 15 million and 20 million page views. But in September 2004, when back-to-back hurricanes ransacked Florida, the peak traffic on one day more than tripled: over 70 million page views by more than 7 million unique visitors. Read the full success story now.

Marketplace