March 24, 2011, 4:38 PM — NEW YORK -- As real-time and batch analytics evolve using big data processing engines such as Hadoop, corporations will be able to track our activities, habits and locations with greater precision than ever thought.
"It will change our existing notions of privacy. A surveillance society is not only inevitable, it's worse. It's irresistible," said Jeff Jonas, a distinguished engineer with IBM. Jonas spoke to a packed house of several hundred people Wednesday at the Structure Big Data 2011 conference here.
For businesses, knowing where people are by using geo-locational data will help them personalize advertising and marketing materials over the Web. For example, if a company knows a customer is in Aruba, it won't bother offering him or her advertising for restaurants in New York, but instead it may market sun-tanning lotion or scuba-diving excursions.
Knowing where people are will also determine with accuracy which potential customer is which. For example, if there are five people living in the U.S. with the same name and the same date of birth, but live in different cities, knowing their locations at a given time verifies their identities.
"Just look at the last 10 years of address histories ... it is very telling if this is the same person or not," Jonas said. "Two different things cannot occupy the same space at the same time."
Jonas said 600 billion electronic transactions are created in the U.S. every day, much of which comes from geo-locational data generated by cell phones, which through cellular towers, triangulate a person's exact location at any time. Wireless providers have that data in real time.
By looking at data over years, corporations can know how you spend your time, where you work, and with whom you're typically with.
"This is super food [for big data analytics]," Jonas said. "With 87% certainty, I can tell you where you'll be next Thursday at 5:35 p.m."
Big data, an industry term that refers to large data warehouses, includes machine- and human-generated data such as computer system log files, financial services electronic transactions, Web search streams, e-mail meta data, search engine queries and social networking activity. In 2010 alone, 1.5 zetabytes of this data was created, most of which was machine-generated. Corporations filled their data center storage systems with about 16 exabytes of that data last year, according to Jason Hoffman, founder and chief scientist at cloud software provider Joyent.
Bill McColl, CEO of analytics engine vendor Cloudscale, said up until now, big data analytics has been about off-line queries or "MapReduce" algorithms, which were developed by Google. But 90% of corporate data warehouse users say they want to move forward into a world with real-time analytics.