Data visualization: Beneficial but perilous
Image credit: Shutterstock
There's something about data visualizations that make people trust them more than they should.
Just ask engineer and data scientist Pete Warden. He created and published a data visualization that used public Facebook profiles to draw conclusions about regionalism in the U.S. and was surprised at its reception.
"A New York Times columnist used it as evidence that the U.S. was perilously divided," he wrote recently in a blog post about his data visualization, first published a couple years ago. "White supremacists dug into the tool to show that Juan was more popular than John in Texan border towns, and so the country was on the verge of being swamped by Hispanics."
That's not exactly what he'd planned. His visualization wasn't meant to be regarded as repeatable scientific research of the kind that's been peer reviewed, he said.
He's not alone in creating a data visualization that's been given more credit than it should. "There is an element of 'wow, it's so professionally presented that it must be true,'" said Jim Bell, chief marketing officer for Jaspersoft.
And yet, more and more people, including those with little or no training in data analysis, are creating such visualizations. That's due to the proliferation of tools, some of them web-based, that make it very easy to build data visualizations.
It didn't used to be that way. Traditionally, companies might spend six months or longer integrating a business intelligence product and then hire a staff of people to make charts for people in the company, said Dave Fowler, founder and CEO of Chartio, a cloud-based data visualization service. Those people would have to have "an odd combination of knowing how to run these big BI suites of software, knowing how to move big data sets around ... and then also doing charting visualization," he said.
But services and software from vendors like Chartio, Tableau, GoodData and others make it easy for just about anyone to click and drag to put together visualizations. The benefit is that more people in a business can examine data in order to make smarter decisions. These tools are particularly attractive to smaller businesses that wouldn't have been able to afford the software let alone the trained people required to use it.
However, businesses are just starting to think – and worry – about how to make sure that the new kinds of workers using these tools have the right kind of training so that they make accurate visualizations and so that they don't overstate their significance.
Early adopters and data science experts are developing best practices so that businesses can avoid potential problems while getting the benefit of allowing more workers to access data.
Training – and a culture of skepticism -- help
For instance, Carwoo, a startup that aims to make car buying easier and has been using Chartio for a few months, is one of those businesses that badly needed a tool that more people in the company could use.
Around a year ago, Rimas Silkaitis, a product manager at Carwoo, started looking for a better way to handle the many requests for data visualizations that his co-workers were making. "What ended up happening over time was engineering and product management became inundated with these requests," he said. Often, they were creating the same reports over and over because they weren't aware that one had already been built. They were using a very manual process, making SQL queries and if necessary Ruby to "mash up" reports, he said.
He looked at higher end products, like those from GoodData and Microstrategy. "Then I realized, hey, we're a startup, we don't have that kind of money," he said. "That's when we found Chartio."
Now, most of the 40 person company – except sales and customer service, which have their own tools – have access to Chartio.
Silkaitis said he worries a bit about users misinterpreting data and creating bad visualizations, but he's implemented procedures that seem to be working so far.
It starts with new hires. "Anybody that comes on new to the company, I sit them down and walk them through our data model and give them a tutorial on how Chartio works," he said.
Training is key to ensuring that workers know how to appropriately use the tools, said Jonathan Dinu, a co-founder and instructor at Zipfian Academy, a school in San Francisco that offers a 12 week training course in data science. While his students aren't new to data analytics, he offered some tips that everyone can follow.
For instance, Dinu recommends that people ask colleagues and bosses to review their reports, giving them the opportunity to challenge the analysis and assumptions used. "This sort of peer review on teams is crucial," he said.
That kind of questioning happens regularly at Carwoo, Silkaitis said. "In any meeting that we attend, there's usually some scrutiny,” he said. Co-workers will ask each other how they came up with the data and why they think the results came out the way they did.
In fact, many of the data visualization tools have social components that make it easy for people to share their visualizations with colleagues. GoodData customers can push their visualizations to Chatter, Yammer and Jive.
Users of Microsoft's forthcoming Power BI service will be able to publish their visualizations to a public catalog or internally in a corporate catalog. The visualization will include metadata that shows who the author is and information about the source data, which can help viewers decide if the visualization is trustworthy, said Herain Oberoi, director of product marketing at Microsoft.
Dinu also stressed that it's important when presenting a data visualization to explain the process used to develop it. Doing so could prevent the kind of misunderstanding that Warden encountered when his Facebook visualization was referred to as if it represented a scientific study.
Garbage in, garbage out
Jaspersoft's Bell recommends having a point person in a business who is responsible for setting up the business layer, making sure that users are accessing the right data and that databases are labelled correctly, for instance.
Silkaitis is essentially that point person for Carwoo. He built a summary table designed to make it easy for Carwoo's nontechnical staff to find the data they need without having to understand the company's whole data model, he said. "People can query that and not worry about accuracy or what the metrics on those tables actually mean," he said.
The experts also said that it's critical that the data end users are accessing is clean. "At the end of the day if the data is clean then the visualization you build will be representative of the clean data," said Microsoft's Herain. "If the data is dirty, it doesn't matter how simple the visualization is, it will be representative of incomplete data."
Silkaitis has also tried to give novices a leg up by creating a number of dashboards so that workers without much experience in data analysis don't have to figure out how to build them. The dashboards meet around 65 percent of the needs of workers, he said.
"For that other percentage, that's where it gets a little hairy and I try to make myself available to people," he said. "If they have questions they can bounce them off me. We talk it through and get their data questions answered. Hopefully, that's a learning exercise that they can go and apply to any other chart they build."
Some vendors similarly offer dashboards designed to meet common needs as a way to guide end users in the right direction. For instance, GoodData offers prebuilt templates for commonly used dashboards like sales pipeline analysis.
GoodData is also working on new technology that recommends the best kind of visualization for the data a user is working with and automatically populates the data. "There is a science to the art of data visualization," said Hubert Palan, vice president of product management at GoodData. "There are best practices for when you should use bar charts or when you should use line charts or when you should use tables."
Businesses should also think carefully about what roles in the business should have access to the tools, as a way to ensure that only people with the right training use them. Also, some products help ensure that the right users have access to the right data. GoodData provides controls that could, for instance, limit a vice president of sales for the west coast to only access data for the west coast, Palan said.
Over time, these best practices will make it common for more and more people in a business to use data visualization tools. Data literacy will be a requirement for many more jobs, Chartio's Fowler predicts. "I think the data scientist of today is the IT guy of the '90s," he said. "In the '90s, nobody knew how to use a computer so you had an IT guy who could come restart our machines all the time. Now people have just learned how to use computers and so you don't have as many guys standing around teaching you how to restart them. With data science, more and more people will learn, and you won't need a data scientist looking over your shoulder."