October 12, 2010, 5:00 PM — Everywhere you go on the Web, you leave breadcrumbs behind -- a comment here, a "like" there, a tweet, and so on. Those tracks may one day come back to haunt you.
Today's reason to be paranoid: Wall Street Journal reporters Julia Angwin and Steve Stecklow's fascinating piece detailing the growth in "scraping" Web sites for information.
Widely known to Web savvy types but obscure to the general public, "scraping" involves using software to hoover up data off Web sites -- usually information posted in public forums or social networks -- and tuck it away into a database, usually for the purpose of selling it to someone else.
[ See also: What's wrong with Facebook's 'Group' grope ]
Companies scrape Web sites to find out what people are saying about their products, find people to sell products to, or figure out who to hire. Marginally legal, scraping is essentially stealing, even if the information is out in public for all to see. Worse, it can violate your privacy in a big way.
The WSJ zooms in on the case of a site called PatientsLikeMe, whose "mood" discussion boards were thoroughly scraped last May, violating the privacy of hundreds of users who posted information about their own personal struggles with mood disorders, including the medications they use.
The scraper in question? The Nieslen Company. Yes, that's right, the TV ratings people. They also operate several Net-centric data mining concerns, one of which pulls nasty sh** like this (or did, until its new CEO stopped the practice, shortly after PatientsLikeMe sent them a cease-and-desist nastygram).
Nielsen is hardly alone amongst the scrapers. Gleaning information from the InterWebs is becoming a big ticket business. To wit:
Marketers spent $7.8 billion on online and offline data in 2009, according to the New York management consulting firm Winterberry Group LLC. Spending on data from online sources is set to more than double, to $840 million in 2012 from $410 million in 2009.