October 25, 2012, 9:43 PM — Think your enterprise has a big data problem? Consider New York City: With a population of 8.2 million, it is far-and-away the largest city in the U.S. And it generates data--a lot of data--most of it in highly fragmented silos contained within the many municipal agencies and departments that keep the Big Apple running.
It has business identification numbers (BINs), borough block and lot (BBL) tax numbers, business licenses, parking tickets, health inspections, traffic violations, crimes, ambulance calls, fires and more.
"We're here at the Hilton in New York," says Michael Flowers, analytics director for the New York City Mayor's Office of Policy and Strategic Planning and director of the Financial Crime Task Force of the City of New York, speaking at the O'Reilly Strata Conference + Hadoop World. "It's a latitude and longitude; it's a postal address. It's a borough block and lot tax number. It's a building identification number and a number of other things that for each agency indicates where it needs to go and what it needs to do." "But in terms of leveraging all that information on behalf of one another," Flowers says, "it becomes extraordinarily difficult from an ontology and taxonomy standpoint. Moreover, all of those pieces of information are stored in different parts of the city, so it's incredibly fragmented. The systems themselves range from the brand new and awesome to play with to things that really aren't that different from pong--old mainframe systems that are really difficult to play with."
NYC's 311 Receives More than 65,000 Calls a Day
To make matters even more challenging, New York City's 311 non-emergency line receives more than 65,000 calls a day--ranging from complaints about noise and reports of potholes and broken sidewalks to questions about obtaining a copy of a deed or whether it's legal to keep piranhas.
"We gear our allocation of our agency resources based on basically a simple queue," Flowers says. "A call comes in and we respond to that call."
The only problem: Calls to 311 aren't necessarily a good predictor of where those resources really need to go. They are data, but they're not complete data.
So Flowers undertook a skunkworks project for New York City Mayor Michael Bloomberg. He and his team needed to show the New York City governmental community how the city's massive volumes of data could be used to allocate resources more effectively.