Deep into big data
Democratisation of data inside a company to make a business agile is important, says Kee Siong Ng, senior data scientist at EMC's Greenplum.Traditionally, he says, there is a conflict over how to classify data mining -- is it an information technology application or part of the wider business. "You see it moving back and forth," said Kee Siong in his keynote address at the 2012 SUNZ Conference in Wellington last week. "One thing you must do is try to formalise that relationship, and really emphasise this collaborative environment," he says. "You might have the infrastructure being managed by IT, but the data side is really owned by the business." This division, he says, is a major problem in a lot of organisations, except in internet companies like Yahoo and Google. "Even in places where they have done data mining in 10 to 15 years, they still have that wall."His advice to enterprises: "Put all your data to work, have a data strategy, first invest in people, then technology," he says. "It is all about efficient, agile analytics."Veering from traditionHe says a typical scenario in organisations is that critical data goes to a large data warehouse, which is controlled by IT. In this system, you can do "shallow type reporting", and then there will also be "shadow systems" built across the organisations to support new data sources.Users will have to export data from the data warehouse and combine this with data from the "shadow systems" to do analysis. Most of the time, they are also working on small samples so the analysis is not complete. He describes a new analysis practice for big data which he calls "MAD" for magnetic, agile and deep. The goal is to build models using all available data, he says. Every time you get new data sources, you don't have to worry about how it fits with the current scheme, you just put it inside the analytics warehouse in its raw form. Agile allows analysts to work faster, and makes it easier for them to push back the results for deployment. Deep analysis allows data mining in large datasets.High performance analytics in actionJames Foster, chief technology strategist for SAS ANZ, says there is misconception that big data refers to companies like Amazon and Google. "It is not just about volume," he says. "It is not how large your data is that you can actually manage. [But] What is relevant for your organisations to make decisions and using the right information?"High performance analytics in the context of big data is turning that data into information and insight, he says. "There is a difference between the information they are currently using and they could be using to drive value."Across industries, he says, there are specific areas with lots of challenges around big data. In insurance, for instance, the business issues are around telematics, claims analytics, ratemaking and catastrophic modelling. In government, the data is around tax fraud/collections, criminal justice, pension portfolio risk, child support areas and delinquences, and in manufacturing they are around predictive asset failure and inventory allocation optimisation. In a lot of organisations, the issue is around customer analytics which include segmentation, acquisition and churn. He says many organisations approach the situation from a technology level. "We need infrastructure to support it, we need to put data in one place and then we will figure out what to do with it." He likens it to a "build it and they will come, a Field of Dreams situation."He suggests taking a different approach. "You have got to focus on the business problem, what am I trying to solve?"What is the actual problem and work backwards," he says. "What is the information that I can potentially use to solve that problem? How do I get that? "Part of the big data challenge [is] if you go down the traditional approach of, 'I am going to structure my data in a certain way,' you are already making assumptions on what data is important rather than letting the data speak for itself. "High performance analytics is saying, why can I not have all the data there and let the data tell me what is important?""You can use analytics to go across all the data, it might pick up trends or pieces of information that you would have excluded from previously."His key advice to CIOs? "Treat data and information as an asset to your company, understand that you need to have an enterprise approach to analytics.""Consider how analytics is used in the organisation and help drive those business outcomes rather than just providing infrastructure support. "There are operational applications that are integrated with the front end. IT's role is to understand how that integration can work and make sure it delivers on that business value."HSBC is a good example to reinforce the increasing importance of IT in these analytical environments, he says.HSBC used analytics to understand the losses due to fraud. A typical fraud process, he says, is detected after the fact, and there is a need to refine the detection models. For instance, if someone in Paraguay is buying a car using a stolen credit card, the owner is called after four hours and has to cancel the card. HSBC is now using SAS fraud management to protect all credit card transactions in real time. The solution runs at the point of transaction to decide whether that transaction is potentially fraudulent and runs analytical processes quickly. The shift is from fraud detection to fraud prevention, says Foster.The result is significantly lower incident of fraud across tens of millions of debit and credit card accounts, and improved detection rates and significant reduction in false positives. In HSBC it is important to run analytical processes quickly and in an environment that is highly available. IT in this case plays a critical role, he says. "They need to own that platform both from an availability perspective, from management and monitoring perspective and integration perspective.""Working with the business is a throwaway statement," says Foster. "Everybody is saying IT and the business need to work together, but analytics is a core reminder of that." Sidebar: When your job title is also a buzzwordThe data scientist as a job is becoming one of the buzzwords in IT today. Fortune magazine calls the data scientist as the "hot tech gig". It is also one of the most important roles in companies as they grapple with the reality of dealing through massive data generated internally and externally."It is fundamentally an interdisciplinary field," says Kee Siong Ng, principal data scientist at EMC Greenplum. Apart from dealing with data computations, you have to know statistics, computer science, understand big data and more importantly, he says, have the knowledge and "active interest" in how businesses and IT departments work.But he says one of the most important traits for the job is "curiousity...in everything.""There will always be technology changes over the years. [And] If you are not naturally curious, you learn that things become relevant for three to four years, and then you become comfortable.""You have to stay curious and keep finding things on your own," says Kee Siong, who moved to EMC over a year ago from the Australian National University, where he continues to be an adjunct lecturer. "That is, perhaps, the only skill you need because everything else you can learn."