Saturday's botched bombing attempt in New York City provides an example of why the use of data mining approaches to uncover potential terrorism plots is a little like weather forecasting.
"You definitely need to do it, because it gives you warning of major storms," said John Pescatore, an analyst with Gartner Inc. and a former analyst with the National Security Agency. "But it's not going to tell you about individual raindrops."
Faisal Shahzad, a naturalized U.S. citizen of Pakistani descent was arrested Monday at New York's J.F.K. airport in connection with an attempt to detonate a car bomb in Times Square. Shahzad, who is scheduled to be indicted on terrorism-related charges in Manhattan on Wednesday, was pulled off a plane bound for Dubai, literally minutes before the jetliner was scheduled to take off.
Shahzad is alleged to have parked an explosives-laden vehicle in Times Square, apparently with the intention of blowing it up. Media reports quoting the FBI and other authorities said the bomb could have caused a substantial number of deaths and injuries had it detonated.
The anti-terrorism task force was quickly able to identify Shahzad as the prime suspect in the case thanks to a series of mistakes the would-be bomber made. But for the moment, at least, there is little to show that authorities had any inkling of either Shahzad or of his plot beforehand.
That fact is likely to provide more fodder for those who question the effectiveness of using data mining approaches to uncover and forecast terror plots. Since the terror attacks of Sept. 11, the federal government has spent tens of millions of dollars on data mining programs and behavioral surveillance technologies that are being used by several agencies to identify potential terrorists.
The tools typically work by searching through mountains of data in large databases for unusual patterns of activity, which are then used to predict future behavior. The data is often culled from dozens of sources including commercial and government databases and meshed together to see what kind of patterns emerge.
In January 2007, there were nearly 200 data mining programs planned or already operating throughout the federal government. Among them were the Automated Targeting System at the DHS for assigning "terror scores" to U.S. citizens and the Transportation Security Administration's Secure Flight program analyzing data about airline passengers. The FBI has several data mining initiatives under way, including some that target terrorists.
One of the most controversial programs was the Total Information Awareness (TIA) initiative, which was quietly launched in 2002 by the Defense Advanced Research Projects Agency but then abandoned in 2003 after Congress stopped funding it following a public outcry. Components of the program are, however, thought to be still alive and well within the U.S. Department of Defense.
What's unclear, though, is how effective these programs have been in identifying and stopping potential terrorist threats such as this latest bombing attempt in New York.
Critics of such programs argue that data mining for terrorists is essentially an exercise in futility given the vast amounts of data that would need to be sifted through on a daily basis, the lack of historical data upon which to base predictions, and the lack of information on patterns that point to terrorist activity.
Bruce Schneier, a noted security guru and chief security technology officer at BT, has long argued that using data mining approaches to search for potential terrorists is akin to searching for a needle in a haystack.
"Data mining works best when there's a well-defined profile you're searching for, a reasonable number of attacks per year, and a low cost of false alarms," Schneier wrote in a 2006 blog post. It's an approach that works well in areas such as fighting credit card fraud where fraudulent patterns are fairly easily discernable, he said.
In the terrorism context, a data mining program can be vital in searching for more information and context on a specific, already identified individual such as Shahzad. But the much larger volumes of data that would need to be sifted through on a daily basis to identify the rare, potential terrorist greatly increase the possibility of false positives and negatives, Schneier said.
Schneier calculates that even the most accurate and finely tuned data mining system will generate one billion false alarms for every real terrorist plot it uncovers.
Similar concerns prompted the National Research Council to issue a report in 2008 calling pattern-seeking data mining tools too unreliable for identifying potential terrorism suspects.
The continued and unchecked use of such tools poses potential privacy problems for uses, the NRC had noted in its 376-page report which was prepared partly at the request of the U.S. Department of Homeland Security.
James Lewis, director and senior fellow at the Center for Strategic Center for Strategic and International Studies, and leader of a team that developed a set of cybersecurity recommendations for President Obama said its hard to pass verdict without more information.
"We'd have to know if data mining had missed all plots or just this one," Lewis said. " If it catches some but not all, then the question is does it catch enough to justify itself.," he said.
"One case isn't enough to tell and my problem with the critics is that they have an agenda - usually privacy - and see everything through that lens," Lewis said. "There is no single solution but data mining might be a useful part of the package. We don't have enough data to know," he said.
Jaikumar Vijayan covers data security and privacy issues, financial services security and e-voting for Computerworld . Follow Jaikumar on Twitter at @jaivijayan or subscribe to Jaikumar's RSS feed ? . His e-mail address is email@example.com .
Read more about bi and analytics in Computerworld's BI and Analytics Knowledge Center.
This story, "NY bomb plot highlights limitations of data mining" was originally published by Computerworld.