Problem notification on small, local networks is easy - just listen for the complaints from your co-workers. On large, global networks, an enterprise-level network management platform can keep watch. But for a midsize network (up to several hundred nodes), you'll need one of these monitoring and alerting software packages. While these tools aren't as feature-laden as ones from Tivoli, Computer Associates or Hewlett-Packard, they can automatically detect network connectivity problems, Windows NT operating system problems or both. In fact, two of the tools we reviewed can only monitor IP-addressable devices and SNMP-manageable nodes (see related story). The other seven offer varying degrees of connectivity surveillance and server health monitoring.
Some monitoring and alerting tools can tell you - via e-mail, pager, SNMP alert or other means - that a server's CPU is overloaded, its memory resources are nearly exhausted, its free disk space is precariously low, or that some other error condition exists. In some cases, these tools can reboot a server, restart a service or take other corrective action without your intervention. For connectivity, a tool might ping IP-addressable devices or poll SNMP-manageable devices to alert you to network failures.
Using one of these tools, you get notification of problems earlier than if you waited for users to complain. Remember that the earlier you deal with a problem, the smaller it is, and the later you deal with it, the bigger it is.
Our evaluation focused on Windows-based products that notify network administrators about connectivity and server health problems. We tested nine products: Ipswitch's WhatsUp Gold 5.0, Ripple Technologies' LogCaster 2.6, NetIQ's AppManager 3.4, Breakout Technologies' MonitorIT 2.0, NTP Software's System Sentinel 2000, MediaHouse Software's ipMonitor 6.0, Heroix's RoboMon NT 7.5, Dartware's InterMapper 3.0 and Argent Software's Guardian 4.1A.
Our World Class Award goes to ipMonitor, which proved to be a superior and intelligent problem notification aid for our network. Its monitoring and reporting features, together with its slick user interface, impressed us greatly. AppManager and RoboMon NT deserve special mention. Their ability to closely monitor the Windows operating system rivaled ipMonitor's.
nIP those problems
MediaHouse's ipMonitor watches Windows NT/2000 machines for excessive resource consumption (via NT Performance Monitor data), failed NT services and specific event log entries. IpMonitor supervises particular applications, succh as Oracle and SQL Server, and it also pings IP-addressable network devices to check for availability and polls them via SNMP to discover their status. The network scanning feature for discovering IP-addressable network devices, like the one WhatsUp Gold uses, is quick and accurate. IpMonitor further impressed us by noting changes to files on our servers, verifying Web page links and maintaining surveillance on our Win 2000 Active Directory data.
In addition to e-mail, pager and SNMP alerts, ipMonitor can also alert an ICQ chat account, issue a network broadcast message, issue a help desk trouble ticket via third-party software or insert an entry into the NT event log. It offered the most corrective action options, including rebooting a server, restarting an NT service or launching a program, batch file or third-party diagnostic utility.
IpMonitor's reports are highly useful and configurable. The supplied report templates include Availability, Response Time, Downtime, Diagnostic Analysis, and Health and Trouble reports. For two user-selectable time intervals (current and historical), the reports show historical activity and can predict trends. Administrators can designate one or more e-mail address destinations for each report. IpMonitor stores monitoring parameters and statistical performance data in its internal database.
The ipMonitor Web-based interface is well-designed and slick. IpMonitor's use of colors to indicate the status of a monitored device or parameter (green for OK, yellow for a new problem, red for a problem for which notifications have been issued and dark red for older problems for which ipMonitor has already transmitted several alerts) provides a great deal of information to anyone who glances at the Monitor Status window. Through ipMonitor's Web pages, we could choose monitoring tasks to run, indicate the schedule on which they should run, and view ipMonitor's reports and status displays.
Installation was a breeze. The printed documentation is a simple "Getting Started" booklet, but the online help is comprehensive, clear and context-sensitive.
An application perspective
The monitoring agents of NetIQ's AppManager impressed us by detecting broken trust relationships across Windows NT/2000 domains, identifying hung and terminated NT/2000 services, noting which applications caused excessive CPU or memory usage, and tracking the number of open shared files. It discovered which users consumed the most SQL Server resources (CPU, memory, disk I/O and locks), and it monitored Exchange throughput and Internet Information Server (IIS) connections and session timeouts. Via SNMP, AppManager discovered and monitored our network devices and non-Windows computers.
Like ipMonitor, AppManager could monitor Windows NT/2000's performance parameters, event logs, registry, services and specific applications running on NT/2000, such as Oracle, SQL Server, Exchange, Lotus Domino, Citrix MetaFrame, Citrix WinFrame and SAP R/3. Its easy-to-understand script language for creating rules and defining behaviors is a dialect of Microsoft's powerful and popular Visual Basic.
AppManager can notify administrators of problems via e-mail, pager calls or SNMP alerts. But it doesn't send network broadcast messages, nor does it issue help desk trouble tickets through third-party software. In addition to its other corrective actions, we liked AppManager's ability to interact with a database through Open Database Connectivity (ODBC).
We found setting up AppManager's real-time reporting of performance factors especially easy. By merely dragging and dropping script icons onto target computers in the Win32 console's tree view, we quickly defined several reports that AppManager immediately began running. The real-time graphs of system activity also let us drill down for details, such as which SQL statements were causing excessive CPU usage in SQL Server. AppManager's reports also showed system status and inventory data. For example, we could see exactly which files were being shared by each server, as well as the IP addresses of network adapters installed in the NT machines. AppManager can produce more than 200 reports.
The Win32 native console and browser-based interface offered the same functionality and were easy to navigate. Each displayed information clearly and intuitively. The multithreaded Win32 console, we noted, was amazingly responsive during our tests.
AppManager's installation process is almost child's play. However, it requires that you already have SQL Server 6.5 or 7.0, which adds considerably to the total cost of the product and caused us to lower the product's installation score-card entry. The printed documentation is clear and professionally done, as is the online help. However, the printed documentation defers to the online help in many places, especially with respect to setting up corrective actions.
Hey, mon! A network repair robot
Although architecturally different, NetIQ's AppManager and Heroix's RoboMon NT have similar features, strengths and weaknesses. We gave RoboMon NT lower scores because it didn't monitor quite as many applications as AppManager, its script language was less powerful and the printed documentation wasn't as good.
RoboMon NT monitored many aspects of our Windows NT/2000 servers and clients. It even took steps to correct some problems before they affected our network. For example, when we flooded a shared printer queue with print jobs in one test, RoboMon NT redirected a portion of the print jobs to an alternate printer queue to head off what otherwise would have been a stand-around-the-printer-waiting-for-my-printout bottleneck. (We enabled pop-up print job completion notifications to send people to the right destination printer.) However, RoboMon NT can't monitor Exchange 2000, SAP R/3 or Lotus Domino R5.
Configuring rules to direct RoboMon NT's behavior is simple and painless, but the script language for creating monitoring rules is weak and unsophisticated. The product includes useful default rules for diagnosing problems, checking system statistics and monitoring network performance.
In another test of RoboMon NT's ability to keep our network up and running, we overloaded an NT machine with several concurrent tasks. RoboMon NT, using one of its many configurable rules, clearly and descriptively displayed the message, "The number of processes on the system is too high." Clearly and unambiguously, it also told us our Dynamic Host Configuration Protocol address pool was almost empty.
For the overall network beyond NT servers, we were disappointed that RoboMon NT monitored just Cisco network devices. RoboMon also runs on Unix and OpenVMS, but we didn't test those versions.
Rule-triggered behaviors can consist of notification, corrective action, rule modification and variable modification.
Rule modification can enable, disable or reschedule other rules. Variable modification changes one or more of RoboMon NT's internal settings, which then dynamically affect other rules.
We found we could have a rule keep an eye on virtually any NT activity we wished. These activities included NT Performance Monitor items, event log entries, Oracle or SQL Server relational database changes, application log file entries, Component Object Model changes and Windows Management Interface events. RoboMon NT even distinguished between chronic problems, persistent problems and new ones. It stores rules and the network data it collects in a Microsoft Access database it creates by default. You can alternately use SQL Server.
RoboMon NT's reporting tools, Report Manager and Graph Manager, present numerous ways to view current and historical network activity. The Graph Manager displays relationships between statistics that RoboMon NT collects, and Report Manager offers reports by group ((disk, event, Exchange, Internet, process and system). For example, the system report group menu consists of summary, cache, physical memory, page file, CPU usage, CPU rates, file I/O, TCP/IP, client/server, Web server and SQL Server reports. However, RoboMon NT lacks a trend analysis reporting function.
RoboMon NT's user interface is intuitive and easy to master. Access-ing RoboMon NT's dispersed network components via a central Enterprise Manager module makes a network administrator highly productive.
Installation takes between 5 and 10 minutes, during which you have to tell RoboMon NT what each agent is monitoring. The printed documentation, consisting of merely a start-up manual, is of poor quality. Paradoxically, the online documentation is comprehensive and clear.
A guardian angel
Like ipMonitor, RoboMon NT and AppManager, Argent Software's Guardian is essentially a Windows watcher that can monitor many different Windows NT/2000 parameters and applications. Like Ripple Technologies' LogCaster, Guardian can also keep an eye on Linux computers. To discover and monitor network devices such as routers and switches, it can also send IP pings and SNMP polling requests across the network.
However, Guardian's corrective action feature was more difficult to configure, it has a dearth of built-in reports and the documentation wasn't as good. And because it starts at $9,000 for 10 servers, Guardian is expensive for small networks.
The monitoring process is completely configurable and uses an object-oriented, rule-based design. We liked Guardian's scheduling feature for running recurring monitoring tasks. The scripting language is proprietary and quite unlike the script languages within RoboMon NT, AppManager and LogCaster. Each rule set specifies how Guardian should perform its problem detection. A rule set can have up to eight classes: event, performance, program, service, SNMP alert, command, system down and printer. Creating a working rule set is a matter of selecting rules within each class and choosing one or more notification methods.
The Guardian Predictor module's built-in reports consist of just a service-level agreement monitoring report and system resource report. However, Guardian stores its monitoring parameters, performance data and network device status information in its ODBC-accessible database. Guardian suggests the use of a reporting tool such as Seagate Software's Crystal Reports to depict its data. In contrast, NetIQ's App-Manager has more than 200 built-in reports, which made AppManager much more useful out of the box.
The native Win32 user interface is easy to navigate and simple to operate. Guardian's Web interface can display server status information and the product's limited set of built-in reports, but it lacks the Win32 console's ability to configure Guardian. In addition, the Web interface requires IIS and doesn't work with Netscape or Apache Web servers.
Installation was straightforward. The documentation, a downloadable collection of PDF and Word for Windows document files, is comprehensive but lacks polish. In places, it's cryptic and seems to have been written by a software engineer.
Halt! Who goes there?
NTP Software's System Sentinel 2000 is a Windows watcher with a twist - pretending to be a user (using account IDs and passwords you specify), it can log on to remote servers to check accessibility as well as availability. It can also propagate System Sentinel configuration changes across multiple System Sentinel servers.
Like Heroix's RoboMon NT and NetIQ's AppManager, System Sentinel tracks Windows NT/2000 event log entries, services and Performance Monitor statistics. Via IP pings and SNMP polling, System Sentinel also monitors the health of network devices. However, it doesn't track specific applications running on NT/2000 or changes to the registry.
When System Sentinel detects a probblem, it can send e-mail, call a pager, update a relational database via ODBC or transmit an SNMP alert. Its corrective action options include launching a program or batch file, restarting a failed service, starting or stopping remote services and rebooting the computer.
By the time you read this, NTP Software says it will have incorporated a new reporting tool in System Sentinel. In the version we tested, the new report module provided a detailed history of Windows NT/2000 events that we could sort by event type, date, source, category, computer and other criteria. The new reporting tool also has a primitive trend analysis reporting capability.
The user interface is Explorer-like, with an expandable/collapsible list of System Sentinel configuration entries on the left and, for selected configuration entries, detailed information on the right. There is no Web-based interface.
Installation is simple. The printed documentation consists of just an installation and quick-start guide, and the online help is incomplete and sometimes confusing.
Casting for network problems
Ripple Technologies' LogCaster does a good job of monitoring Windows NT/2000 event logs, performance data, TCP/IP-based devices and certain services. Using filters that took us just minutes to set up, LogCaster selected and consolidated entries in our event logs and then notified us when problems occurred. Its monitoring of NT/2000 services and performance data similarly produced notifications when, for example, a particular service unexpectedly died or CPU utilization approached 100%. At intervals we specified, LogCaster's TCP/IP Watcher pinged our routers, Web servers and TCP/IP clients to identify network failures.
LogCaster also has a Syslog Watcher monitoring tool for spotting problems as they crop up in Unix or Linux log files. Syslog Watcher works by translating User Datagram Protocol (UDP) messages it receives on Port 514 into NT event log entries. LogCaster includes plug-ins for monitoring Citrix MetaFrame servers and Check Point Software Firewall-1 servers.
In all cases, LogCaster's monitoring agents insert entries in the NT/ 2000 event logs. Selected log entries then trigger LogCaster's problem notification process.
How We Did It
We tested the products' ability to monitor the health and availability of our servers and network devices, as well as the ability to resolve a problem automatically.
We tested the sending of alerts by pager, single or multiple recipient e-mail, Web page or pop-up dialog boxes to notify us of network outages. We further expected a product to produce reports that helped establish baselines, show available and unavailable devices, log device availability histories, identify trends and spot future problems.
We noted whether a product checks for TCP/IP port device availability and monitors TCP/IP services such as SMTP, HTTP and telnet. We also noted whether a product uses SNMP to retrieve details about a device, makes use of Windows NT auditing information, and monitors NT services and events. Network device inventory was important, as was the ability to reveal server or client CPU usage, disk space and memory consumption.
Our test network consisted of six Fast Ethernet subnet domaains routed by Ascend routers. Our platforms included Windows NT/95/98/2000, Unix (AIX 4.3), Red Hat Linux 6.2, Novell NetWare 5.1, Macintosh System 8 and OS/2 Warp 4.0. Relational databases on the network were Oracle 8i, Sybase Adaptive Server 11.5 and Microsoft SQL Server 7.0. Windows NT and NetWare shared files, while Internet Information Server, Netscape and Apache software served up Web pages. The network's protocols were TCP/IP, IPX, AppleTalk and SNA.
All the alerting and monitoring products, except for Dartware's InterMapper, ran on a Gateway NS-8000 computer with 333-Mhz Pentium II dual processors, 512M bytes of RAM and three 9G-byte SCSI RAID drives. InterMapper ran on a Macintosh G3 server. Network Associate's Sniffer protocol analyzer software with Database Module Options for Sybase and Oracle, running on a Dolch PAC63 computer, generated packets and decoded network traffic. Gambit Communications' MIMIC 4.3 simulated from 10 to 1,000 SNMP-aware devices.
LogCaster's focus is quite narrow. For example, the service monitoring agent only works with Exchange 5.5, IIS 4.0, Proxy Server 2.0 and SQL Server 7.0. It doesn't monitor for Oracle, Active Directory, SNA Server, SAP R/3, Terminal Service, Citrix WinFrame or Lotus Domino. Furthermore, if a problem doesn't manifest itself in the Windows NT/2000 event log (either on its own or through a LogCaster agent), LogCaster won't see it.
LogCaster can send problem notifications via e-mail, pager, SNMP alert and, via ODBC, a relational database. Its corrective-action feature can launch a program or batch file to fix a specific problem. Ripple Technologies also provides a ``restart this service'' repair option.
LogCaster's reports are line graphs depicting one or more monitored statistics, such as available memory bytes per server, for a specific time period. Unfortunately, LogCaster's reporting tool can't produce list- or tab-format reports.
We liked the ability to define Business Groups in LogCaster. We used this feature to designate whether an NT machine was in a System Group or a particular Business Group, then produced reports (graphs) that distinguished monitored statistics by type of machine.
The Win32 console's user interface is straightforward and intuitive, and its dialog boxes are especially easy to understand and navigate. The telnet interface for remote administration is functional but far from elegant. LogCaster doesn't have a browser-based interface.
LogCaster's installation process is simple and doesn't require rebooting NT or Windows 2000. The printed documentation is thorough and clear.
Like ipMonitor, RoboMon NT and AppManager, Breakout Technologies' MonitorIT monitors Windows-based computers. However, its server component runs on Windows 95/98 as well as Windows NT/2000. It can also ping IP-based network devices to determine connectivity and availability, and it can poll servers to monitor for HTTP, FTP, SMTP, POP3 and Domain Name System activity. From all the Windows machines on the network, MonitorIT gathered the performance and configuration data we selected from its menus. The menu items we could choose from included cache, Distributed Transaction Coordinator, HTTP indexing service, memory and network interface.
MonitorIT's notification methods include sending e-mail, calling a pager and launching a program to take corrective action (the program runs on the MonitorIT server, although the problem may have occurred on a different machine). MonitorIT doesn't transmit SNMP alerts.
For our Windows machines and IP-addressable devices, the reporting tool gave us excellent real-time and historical views of availability, performance and selected monitored events. Using the predefined report templates showed us NT Server general performance, file server performance, IIS performance and other data. The product's built-in inventory reports listed IP-addressable computers and MonitorIT configuration data.
MonitorIT's score-card value for management is lower because its native Win32 interface is not intuitive. When defining an alert (for example, setting up a threshold to be monitored), the program starts off in a "review mode." You must click on New or Edit buttons to enter alert parameters. Furthermore, the interface expects you to hover the mouse cursor over what it terms Counters and Objects to see a selectable list of those Counters and Objects. In contrast, the product's Web interface is much easier to use and offers remote access to all MonitorIT's administration and reporting functions.
MonitorIT installation is complicated only by the need to choose whether to use SQL Server or MonitorIT's internal database to store the monitoring thresholds and performance data. The printed documentation and online help are adequate.
Network administrators deserve a useful task automation tool that can keep a watchful eye on servers and network devices. For keeping small to midsize Windows-based networks up and running, we strongly recommend MediaHouse Software's ipMonitor. It can shoulder a considerable portion of a network administrator's NT server monitoring burden as well as quickly detect network device connectivity problems. NetIQ's AppManager and Heroix's RoboMon NT are also excellent early-warning tools for discovering and dealing with Windows server errors and network issues.
This story, "Windows watchers" was originally published by Network World.