As you may have suspected, there is a problem with groups such as the EFF, EPIC, ACLU and others who weave wild stories about the amount of information advertisers are trying to collect, as they advocate for consumer privacy rules on the Internet.
There is also a problem with the paranoid obsessive people in your office who use special web browsers, avoid all the sites everyone else depends on, clean out their cookie caches every 10 minutes, encrypt and password protect even documents they send to you and blacklist so many things it's hard to even get a reply in email back to them:
They're not paranoid enough.
A big enough conspiracy can make paranoia look naive
According to results of a study being conducted at Stanford's Center for Internet & Society, not only do many popular web sites wring as much personally identifiable information as possible out of its own users, they funnel that data to other web sites, spreading news of one user's browsing habits to as many as 22 companies with every visit to a particular site.
Worse, the information leaks aren't just anonymous clicktrails – a record of the pages you've viewed and links you've clicked, without anything that could identify you personally.
They usually – not sometimes, not frequently – usually contain enough information to identify you personally as the one who visited a particular site.
The rate of leakage makes privacy statements like that of Home Depot – which, typically, promise not to sell or rent your personal information with third parties – irrelevant, misleading or outright lies.
The study, from Stanford Ph.D. candidate Jonathan Mayer, found that, by the most conservative, forgiving standards he could reasonably use, at least 45 percent of the 185 popular web sites he studied leak user names or user IDs.
Third party sites can collect that information, use a simple algorithm to match it with social-networking profiles or other publicly available data, and name the end user accurately 7 times out of every 10, according to a related study published in May that maps out a huge and growing mismatch between technology able to violate users' privacy and those available to protect it (PDF).
They know all your secrets...and offer the perfect solution at an attractive price!
The real revelation isn't that private data can leak out when you hit a web, site, it's the speed of the leakage and number of receivers.
Click a local ad on HomeDepot.com and 13 companies get your name and email address. Type the wrong password into WSJ.com and seven companies get your email. Click the validation link in the signup email for a Reuters newsletter and 5 companies get your email.
Interact with classmates.com and 22 companies get your full name; Bleacher Report sends it to 15 companies.
Changing user settings on Metacafe sends your full name, birthday, email and physical addresses and phone number to two companies, Mayer found.
The studies refer to the whole process as private-data leakage, but Mayer noted specifically that in web advertising, "leak" does not mean "accident."
In computer security, leakage is a term of art for an information flow – some instances of leakage are entirely intentional. For example, OkCupid, a free online dating website, appears to sell user information to the data providers BlueKai and Lotame, including gender, age, ZIP code, relationship status, and drug use frequency. – Jonathan Mayer, Tracking the Trackers: Where Everybody Knows Your Username.
And that's just looking at what information the first-party site hands to third parties accidentally (sometimes), for a fee, as part of a data-trade or as a premium for buying an ad in the first place.
Third parties – companies you never chose to deal with or agreed should be allowed to use your data – can also identify you buying profile data from a matching service it can use to narrow down your identity, geo-locate your IP to reduce the number of possible false IDs, use security holes in your browser to gather more of your data, or "deanonymizing" you by matching generic new clicktrails, usernames or other information with data that's already confirmed to be about you, according to Arvind Narayan, another researcher in Stanford's CIS.
"Identification of a user affects not only future tracking, but also retroactively affects the data that's already been collected. Identification needs to happen only once, ever, per user," Narayan wrote.
Once again, though, that's just the beginning.
All those methods depend on data leaking directly to third parties.
If you won't give them data, they may just take it
Even without those links, several different flavor of "supercookies" can also tag your browser with a unique identifying number, add a programmable "Flash cookie," or profile its characteristics – all the personalization, add-ins and mods you've done – to make it uniquely identifiable.
As HTML5 becomes more widespread, its ability to store data locally and re-use it in later sessions will add a whole new genus to the supercookie family.
The charmingly persistent "zombie cookies" can even reconstitute themselves after you've found and deleted them.
The third parties that zombie-tag you don't even, necessarily have to be sleazy online advertisers. Zombie cookies are for software vendors, too.
Microsoft admitted in August – after Mayer produced the evidence – that that a script designed to sync data from cookies planted by any Microsoft site would sometimes re-create a unique ID the user had erased.
Tin-foil-hat browser extensions to be de rigeur
ExtremeLabs researcher and ITWorld columnist Tom Henderson suggests the only solution is to run browsers in a sandbox or virtual machine to corral the cookies in a way that lets them work while you're browsing – so you can actually use the sites you visit – but delete the whole container and all the cookies inside it once you're done.
Security software vendors claim they're the answer. But the first link in that sentence is an article from July, 2010 claiming the problem had been solved. The second is a piece form August of this year suggesting how facial-recognition software could be misused by advertisers to ID users involuntarily.
The number of sources to exploit and energy advertisers have been putting into the effort suggests a lot more tenacity than a simple Do Not Track policy would fix.
It suggests the same kind of ongoing spy-vs-spy competition that has been going on for years between virus and antivirus developers for years with no resolution.
It suggests that even with legislation to prevent it, there will be a lot of cheating and a lot of surreptitious ways to gather and confirm private data.
Ultimately it suggests the only place it would be possible to start closing off the vein is to make the brazen, wholesale collection, use and manipulation of private data illegal.
That, at least, would add some pressure to slow the flood of private data and push the balance of power back to the point that it's possible to be too paranoid about the kind of information web sites are collecting about you.