NY bomb plot highlights limitations of data mining

Like weather forecasting, data mining can predict major storms but not where each drop will fall

By Jaikumar Vijayan, Computerworld |  Software, data mining Add a new comment

Saturday's botched bombing attempt in New York City provides an example of why the use of data mining approaches to uncover potential terrorism plots is a little like weather forecasting.

"You definitely need to do it, because it gives you warning of major storms," said John Pescatore, an analyst with Gartner Inc. and a former analyst with the National Security Agency. "But it's not going to tell you about individual raindrops."

Faisal Shahzad, a naturalized U.S. citizen of Pakistani descent was arrested Monday at New York's J.F.K. airport in connection with an attempt to detonate a car bomb in Times Square. Shahzad, who is scheduled to be indicted on terrorism-related charges in Manhattan on Wednesday, was pulled off a plane bound for Dubai, literally minutes before the jetliner was scheduled to take off.

Shahzad is alleged to have parked an explosives-laden vehicle in Times Square, apparently with the intention of blowing it up. Media reports quoting the FBI and other authorities said the bomb could have caused a substantial number of deaths and injuries had it detonated.

The anti-terrorism task force was quickly able to identify Shahzad as the prime suspect in the case thanks to a series of mistakes the would-be bomber made. But for the moment, at least, there is little to show that authorities had any inkling of either Shahzad or of his plot beforehand.

That fact is likely to provide more fodder for those who question the effectiveness of using data mining approaches to uncover and forecast terror plots. Since the terror attacks of Sept. 11, the federal government has spent tens of millions of dollars on data mining programs and behavioral surveillance technologies that are being used by several agencies to identify potential terrorists.

The tools typically work by searching through mountains of data in large databases for unusual patterns of activity, which are then used to predict future behavior. The data is often culled from dozens of sources including commercial and government databases and meshed together to see what kind of patterns emerge.

In January 2007, there were nearly 200 data mining programs planned or already operating throughout the federal government. Among them were the Automated Targeting System at the DHS for assigning "terror scores" to U.S. citizens and the Transportation Security Administration's Secure Flight program analyzing data about airline passengers. The FBI has several data mining initiatives under way, including some that target terrorists.

One of the most controversial programs was the Total Information Awareness (TIA) initiative, which was quietly launched in 2002 by the Defense Advanced Research Projects Agency but then abandoned in 2003 after Congress stopped funding it following a public outcry. Components of the program are, however, thought to be still alive and well within the U.S. Department of Defense.

What's unclear, though, is how effective these programs have been in identifying and stopping potential terrorist threats such as this latest bombing attempt in New York.

Critics of such programs argue that data mining for terrorists is essentially an exercise in futility given the vast amounts of data that would need to be sifted through on a daily basis, the lack of historical data upon which to base predictions, and the lack of information on patterns that point to terrorist activity.

Bruce Schneier, a noted security guru and chief security technology officer at BT, has long argued that using data mining approaches to search for potential terrorists is akin to searching for a needle in a haystack.

"Data mining works best when there's a well-defined profile you're searching for, a reasonable number of attacks per year, and a low cost of false alarms," Schneier wrote in a 2006 blog post. It's an approach that works well in areas such as fighting credit card fraud where fraudulent patterns are fairly easily discernable, he said.

In the terrorism context, a data mining program can be vital in searching for more information and context on a specific, already identified individual such as Shahzad. But the much larger volumes of data that would need to be sifted through on a daily basis to identify the rare, potential terrorist greatly increase the possibility of false positives and negatives, Schneier said.

Schneier calculates that even the most accurate and finely tuned data mining system will generate one billion false alarms for every real terrorist plot it uncovers.


Originally published on Computerworld |  Click here to read the original story.

ITworld LIVE

SoftwareWhite Papers & Webcasts

White Paper

Activities Streams Base An Integrated Social Layer

The enterprise social software market is exploding thanks to converging trends of consumerization, cloud, and mobile. In this must-read report, "The Forrester Wave: Activities Streams, Q2 2012", Forrester Research Inc. evaluated five social software vendors with core strengths in the stream based on the overall strength of vendors' current offerings, a clear product strategy, and vendor market presence. In a detailed look at the space, Forrester named Yammer as a leader.

White Paper

ESG Lab Review: HP 3PAR Peer Motion Software

This ESG Lab review sponsored by HP + Intel documents hands-on testing of HP 3PAR Peer Motion Software's distributed volume.Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

White Paper

ESG Lab Review: HP 3PAR Peer Motion Software

This ESG Lab review documents hands-on testing of HP 3PAR Peer Motion Software's distributed volume management with a focus on federated workload balancing, asset management, and thin provisioning.Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

White Paper

Deliver Cost-Effective Business Continuity with Extreme Capacity

IBM DB2 provides application cluster transparency technology that equips organizations running OLTP applications with the ability to deliver high availability and continuous uptime for transactional data, plus the flexibility and capacity they need to remain competitive.

White Paper

What Developers Want: The End of Application Redeploys

Eliminate application restarts in Java with JRebel! JRebel is a JVM plugin that eliminates application redeploys from the Java development cycle, a process that takes over 10 minutes of coding time away from developers each working hour, according to a recent survey. Just code, refresh and see everything instantly.

See more White Papers | Webcasts

Ask a question

Ask a Question