August 08, 2008, 2:29 PM — There's more at stake than lost productivity when application response times slow to a standstill. Company revenue also takes a hit.
Aberdeen Group recently surveyed 200 organizations and found that issues with application performance affect overall corporate revenue by as much as 9%.
"No one is safe today from the negative impact of poor application performance, whether you're a gamer, a retail outlet or a Salesforce.com user," says Jasmine Noel, principal analyst at Ptak, Noel & Associates. "The question is how much money do you have to lose if the application starts crawling along at traffic-jam speeds?"
On the one hand, multi-tiered applications help companies do better business, but on the other, the complexity of the environment in which they reside challenges network managers looking to prevent problems before they reach users and customers. Adding to the problem is the growing adoption of such technologies as virtualization, VoIP and service-oriented architecture -- which require sophisticated environments that could hinder troubleshooting efforts when problems arise.
"In the past, we were managing the infrastructure, which really doesn't get into how the application is performing for the end users," says Jason Norton, director of operations and telecommunications at media and marketing company Scripps Networks in Knoxville, Tenn. "We need to be able to be aware of and see all of the pieces that make up an application and how they affect the end user to understand when performance is going to be impacted, he says."
Here we analyze three scenarios in which application-performance problems could elude network managers.
Can you hear me now?
Symptom: VoIP calls begin to experience poor quality and latency, some even dropping altogether.
When Koie Smith, IT administrator at Jackson, Tenn., law firm Rainey, Kizer, Reviere & Bell, noticed VoIP calls performing inconsistently across the network, he first tried to trace the problem to a specific port.
"We have had instances in which a performance problem would occur because something had been put on the network, such as a network-interface card or a port, that causes a problem with an application, such as voice," Smith explains.
But the performance issues with the application couldn't be traced back to a specific port. Smith began looking further into the QoS settings he had established when he rolled out voice traffic and discovered why the voice users were suffering: Undefined priority tags on voice packets across multiple switches meant that only some of the traffic was given priority, which resulted in spotty performance.
The solution? Updating the QoS tags and specifically defining the priority tags for voice traffic across all network switches.
"From the network side of it, it is critical to define tagging and assign priorities on the switch for such specific traffic as voice or video," Smith says. "A lot of switches will identify and acknowledge that QoS tag, but if you don't also have a priority tag on each switch - even if you have it tagged at the source - the switch just dumps that traffic in with all general traffic, and it doesn't get the allocated bandwidth it needs to perform well."
Application, heal thyself
Symptom: E-mail grinds to a halt on a user workstation, but by the time the help desk attempts to fix the problem, it seemingly has resolved itself. A few weeks later, another user reports a similar concern.
Tracking down the source of transient problems is one of the more challenging tasks for network managers, Noel says.
For one, many troubleshooting techniques require network managers to capture data about what was happening at the time of the problem. In addition, most minor problems that occur intermittently point to a larger underlying issue that IT must resolve before the application stops acting spotty and fails altogether.
"First, you have to notice this has happened and record it, so that when it happens again, you know it deserves attention," Noel says. "Then you have to catch it when it happens, so you can dissect it and prevent it from happening again."
Baselining typical application performance with monitoring and measurement tools can help network managers understand how an application typically behaves and set thresholds to be alerted when it begins to stray. By using probes that capture traffic and packet data, network managers can go back and recreate these incidents as they happen and look for similar traits - a misconfigured server or a poorly designed application, for example.
"You have to check configuration details, memory and utilization; and you have to do it for all of those mini-incidents to see the common thread that connects the instances to a larger problem in the environment," Noel says.
Scripps Networks' Norton uses NetQoS SuperAgent technology to collect conversations from across the network to pinpoint the cause of problems that crop up and seemingly disappear. The product enables him to do packet capture and perform SNMP polls to gather data about transactions, thresholds passed and application-response times.
"You almost have to have the problem occur to be able to troubleshoot it. NetQoS will show us different pieces across the network [when] a database was running slow or server processes took a particularly long time," Norton explains. "It helps us to say with confidence, 'At this point in time, this is what was happening.' And that speeds troubleshooting."
Location, location, location
Symptom: A file-sharing application performs well for some users, but others report problems trying to access and work with the same application.
"We have had problems that come up in one location that don't happen in the other location," Norton says. "And when you are dealing with the same application, it can be tough to translate why an application would be slow for one group and not the other."
The source could be misconfigured devices, which could stall application traffic, even across a LAN. For instance, DNS servers could route application traffic down different paths, causing slowdowns for some users while others experience no change in service.
"If DNS is not set up properly, applications will run pitifully slow because anything talking across the network is talking to it by name. If the name is not accurate, the IP address cannot be resolved and traffic will come to a halt," says Glenn O'Donnell, a senior analyst at Forrester Research
One solution is to implement a combination of application-dependency-mapping and configuration-management tools. These can help network managers understand which servers applications to use to fulfill requests, and track how configurations might have changed or may differ among resources, leading to a slowdown.
"The brightest sleuths are often assigned to find these [configuration errors], but even they are now becoming overwhelmed by the complexity. It can take a number of days to hunt the problem down, and that can become time consuming and expensive," O'Donnell says. "We need to come up with better modeling tools to analyze all of the possible combinations of configuration settings. The needle keeps moving, and the haystack keeps getting bigger."
Aside from pinpointing the initial cause of a configuration error and correcting it, network managers should be establishing rigid change-and configuration-management policies (such as those detailed in ITIL) to make sure unauthorized changes don't result in major outages later on.
"If an organization has well-defined problem and incident-management processes, they can quickly detect a problem they haven't seen before and work to define how to handle it the next time it occurs," Ptak, Noel's Noel says. "Invariably the problem will happen again - maybe not tomorrow, but when everyone has completely forgotten about it. If the proper processes are in place, organizations can use proven methods to resolve such errors and even work to the fixes with tools."