Retailers need to heed big data in their machine logs
Retailers place big bets on big data, digging to discover buying trends and preferences from masses of structured and unstructured data. Much of that data comes from outside their organization, in forms as diverse as research, reports, charts and even video files. However, while there's plenty of that data to preoccupy retail planners, that preoccupation often comes at the expense of gleaning intelligence from the data found in logs and other machine data produced by their own applications, websites, servers and supporting IT infrastructure components.
Machine data offers a wealth of information, but often goes untapped in the retailer's quest for external market research on buyer patterns, preferences and plans. And it's ironic that data from machine logs, while frequently overlooked, is always more time-critical than data from external sources. Machine logs that would warn of impending retail application downtime could save millions of dollars in lost revenues for a large online retailer during a holiday season.
In mid-2012, Target Stores lost, by one estimate, $464,000 in just 150 minutes of downtime caused by high server traffic; while that downtime could have been avoided by access to machine logs, external data would have been the electronic equivalent of a paperweight.
The big machine data
Logs and other machine data are the output of an organization's IT assets – essentially every application and device in the organization's IT infrastructure.
Machine data is one of the fastest growing components of big data, in part because it's generated by virtually every piece of IT hardware and every software application – servers, retail and related applications, and mobile devices – and sensors and input devices of all kinds. In fact, IDC forecasts that machine data will account for 40 percent of the total data generated in 2020, up from 11 percent in 2005.
By managing this data proactively instead of only when something goes wrong, organizations can help mitigate risk, ensure service availability and promote operational efficiency. It's an amount of data that many retailers are not prepared to struggle with, but tools such as log management systems are purpose-built to handle.
The anemic adoption of analytics for machine data may have several causes. In any case, 58 percent of retail industry professionals in a first-quarter, 2013, survey by Brick Meets Click stated they were concerned that their organizations were not using already available data. (Note: Data that is already and forever available to organizations is not externally sourced data, but instead the machine data, or logs, that organizations' IT systems spew out.)
In the same survey, 38 percent of respondents reported having no available budget for big data, and 35 percent stated that they had difficulty in collecting data. Although nothing is conclusive, those figures may reflect a misunderstanding of the nature of the big data in machine logs or the available analytics tools for deriving insights from those logs.
Understand logs to understand customers
Here's an example of what you can discover in machine logs. By using tools to monitor and manage logs on a very granular level, retailers can get amazing insights into customer behavior – i.e., the behavior of their customers, in contrast to anonymous consumers, whose aggregated behavior is identified by third-party research, and not by your customer data. Retailers can quickly spot:
- What products customers are buying, in what amounts, and in what varieties
- When peak buying times occur
- How pricing influences customer buying
- Correlations between social media campaigns and purchasing behavior
- How to increase sales and profitability by changing prices in real-time for products (or at specific times) when cost sensitivity is at its lowest or when sales of a product are lagging
While gleaning this information, log management and analytics can also provide a feedback loop into your data warehouse or business center to pinpoint the capacity or changes in inventory required to ensure customer responsiveness and reduce lost sales.
But machine logs play another key role: helping IT planners understand how effectively their network infrastructure is supporting their business goals. For example, virtualization logs are used to gain operational insights into the performance, capacity, and security of virtual machines, and those logs contribute directly to maximizing overall IT performance.
Take the case of Atchik, a European provider of mobile community and entertainment services to mobile operators. While Atchik uses log management to monitor all the company's device and cloud application logs, its ultimate goal is to quickly identify and resolve customer issues. And so, the same logs serve both to improve operational performance and to understand – and retain – customers.
More concretely, because your machine logs can tell you what a customer ordered, you now have a deeper understanding of the customer. But those logs can also track what happened between the transaction point and the warehouse and can identify, for example, a shipping failure.
Real-time analysis: A key to heading off operational failures
Machine logs often go unappreciated when an application is not down, or if IT planners can't correlate loss of revenue with network or server slowdowns. But this complacency comes at a risk. Many issues are easily solvable with the insight offered by machine data. More importantly, failures or breaches that are imminent or already silently “underway” can be nipped in the bud.
A log management system allows retailers to:
- Respond instantly to an incipient failure or breach. Precious seconds are at stake when the application is at risk of a crash.
- Set up proactive monitors that alert IT planners to impending issues, whether in hardware or software
In either case, retailers need the instant responsiveness made possible by machine logs. And that's where real-time responsiveness is critical. There's no time to wait for a batch job to export data or time to process the data. That's why tools such as Hadoop or Cassandra, although powerful, are not ideal: they're not "real time." Even a loss of 30 minutes due to a downed application could cost an online retailer tens of thousands of dollars in lost sales, as well as hard-to-quantify losses due to churn. According to Aberdeen Group the average cost of downtime increased 65 percent from June 2010 to February 2012, to approximately $165,000/hour.
Proactive monitoring provides critical Alerts
Proactive monitoring is the corollary to real-time responsiveness. In short, it means one thing: forewarned is forearmed. And this forewarning comes by way of alerts that can highlight impending issues in the application, network, storage, or supporting IT infrastructure.
Take a hypothetical, but all too realistic, example. Let's say that your website went down two months ago, and troubleshooting after the fact revealed excessive loading of the network. Essentially, the network could not carry the traffic flowing between application nodes, and the site went down.
The solution: set up a real-time monitor or dashboard in your log management system to monitor network utilization and send an alert when utilization reaches a specified level. The alert highlights the impending problem in sufficient time to address it – allowing IT, for example, to rebalance loads or, spin up a larger number of servers in a separate physical location or public cloud to offload existing network infrastructure.
This scenario once again highlights the importance of real-time responsiveness of a log management system. In short, an impending problem can be prevented only because a real-time alert allowed IT to take action in advance. Or, quite simply, make sure your log management system delivers real-time notification of known and unknown indicators of future outages.
But what assurance do you have that your log management system can not only make sense of the billions of logs of machine activity that it produces daily, but can analyze its “findings” and alert IT to impending issues in real time? That's the job of the system's analytics engine, which has the multitasking role of monitoring logs in real time, monitoring OS and application performance, detecting anomalies, and identifying root causes.
In retail, of course, it's critical to collect and manage data from custom marketing and sales applications. That's just good business. But the greater goal is to collect and analyze logs from all of the “stacks” in your IT systems. Those include industry-specific applications, storage servers, operating systems, ERP systems, web servers, and open-source applications – which all contribute to a composite picture of IT activities for both real-time and historical analysis. It's worth noting that point solutions – such as for scraping logs, creating scripts or developing applications – are not alternatives for real-time log management.
Logs help retailers improve their operational posture
A retailer's handling of machine logs also says a lot about its operational posture. A strong operational posture is built from an even stronger security posture – one built on the proposition that security breaches can endanger not just the retailer's consumers but can jeopardize the retailer's very existence.
Machine data is absolutely critical to strong security, not just as a warning of a breach or hack, but it also contains evidence and attack vectors of previous breaches and attempted attacks. In a retail organization, security is table stakes, and should therefore be a non-issue.
With security provisions established, the attention then goes to an operational posture built on knowing exactly what's going on with every piece of hardware and every application on the network, and even more – composite stats on network and server loads and other critical indicators.
Here are just a few such indicators: a failed hard drive, a load balancer failure, a software exception, an unreachable database, or a rise in network latency.
Cloud-based log management
Log management, like other enterprise applications, has gravitated to the cloud over time. Today, on-premise and cloud models are in use across enterprises, but two attributes of the cloud model can benefit retailers, and in two very different ways:
Cost avoidance: In a cloud model, capital expenditures are eliminated, and costs for retailers are geared to the seasonality of retail sales. Because the cloud eliminates capex for on-premise equipment and software, retailers can balance their expenses by opting for usage-based fees.
Seasonality: Similarly, an elastically scalable cloud model eliminates the need for retailers to provision log management for peaks or bursty data patterns – such as for seasonal and holiday sales – and also eliminates the severe underutilization that occurs when equipment and software run at perhaps 20 percent of capacity during retail off seasons. Retailers also reduce costs by eliminating the need for infrastructure support.
Embedded knowledge: Cloud solutions have the opportunity to derive insights across many customers who use similar infrastructure components (e.g., VMware, Linux, or Apache) and provide those insights across all of their customers who leverage similar software stacks. For example, a zero-day failure caused by an upgrade to a version of an application or operating system detected at a subset of customers can be proactively pushed to other users who leverage the same software components, and can thereby prevent an issue from ever occurring in their infrastructure.
Log management is critical to a retailer's business and security posture. It's essentially the analytics tool that allows you to understand and predict the buying behaviors of your own customers, while preventing outages and attacks that could cause customer churn or damage your brand.
About the author:
Bruno Kurtic joined Sumo Logic from SenSage, where he was the Vice President of Product Management. Before joining SenSage, Bruno was with the Boston Consulting Group (BCG) where he developed and implemented growth strategies for large high-tech clients. Prior to BCG, he spent six years at webMethods, where he was a Product Group Director for two product lines. At webMethods he started the west coast engineering team and played a key role in the acquisition of Active Software. He was also with Andersen Consulting's Center for Strategic Technology in Palo Alto and founded a software company that developed handwriting and voice recognition software. Bruno holds an undergraduate degree in Quantitative Methods and Computer Science from University of Saint Thomas and an MBA from Massachusetts Institute of Technology (MIT).