June 03, 2014, 6:00 AM —
Last week I wrote about a study of academic computer scientists and their research areas, which found a sharp upward trend recently in the study of informatics, which includes data mining, information retrieval and privacy. Coincidentally, last week also saw Jennifer Golbeck, the Director of the Human-Computer Interaction Lab at the University of Maryland, take part in a Reddit AMA session, during which she discussed the many things computer scientists like herself can learn about us from our online activity. It was a good opportunity to learn more about what this new kind of research means to you and me and just how much privacy we give up by going online.
Some of what Golbeck had to say wasn’t so surprising. Like, for example that, right now, the main use of much of this data is to serve advertising.
“Ads are the place where there seems to be money in this now.”
Also, data scientists (and advertisers) are mainly looking for correlations in the data (i.e., if you like this thing on Facebook, that implies a certain thing about you), not necessarily for causal relationships.
“... we are just looking at correlation.... We don't care why - the models just use that correlation to make a prediction.”
More interesting, I thought, were Golbeck’s comments about what our online activity can indirectly say about us, in particular in ways that we wouldn’t expect and couldn’t really prevent.
“... many inferences come from likes that are way less obvious or even nonsensical. I tell the story in my TEDx talk … that liking the Facebook page for Curly Fries was shown to be one of the top predictors of high intelligence in a large study from Cambridge. That don't make a lot of sense, which means it can be very hard for an individual to prevent these algorithms from learning things about them.”
While much of our online data is used for advertising, Golbeck also talked about the still untapped potential of our data, what it says about us (both directly and indirectly) and the ways in which she could see it being mined in the future.
“I often (half) joke that if I get bored with this job, I would start a company that aggregates a lot of information about people, makes inferences over it (inferring things like commitment to your job, how well you work with others, how much of a procrastinator you are, etc.) and sell that report to businesses like your credit report gets sold. I think there is a lot of opportunity to make money off this data, but we are just starting to see this happen.”
If this sort of thing makes you shudder, Golbeck also shared advice on how best to protect your privacy, and what precautions she herself takes when going online. One thing she really recommends, which I haven’t heard before, is to regularly delete your old social media activity.
“... the best thing you can do is crank up your privacy settings, be careful about what you share (don't assume those privacy settings are iron clad), and delete old stuff that you've posted liberally and frequently. None of this is surefire protection - content is archived, people make copies, privacy settings aren't perfect, etc - but these measures will make it a lot harder for people to track down potentially negative information to use against you.”
Golbeck says that she regularly deletes all of her social media activity that's more than a 3 or 4 weeks old. Wow!
Her full AMA is worth the read, if you’re concerned at all about how your online data is being used. Now, if you’ll pardon me, it’s time for me to clean out my old social media activity.
Read more of Phil Johnson's #Tech blog and follow the latest IT news at ITworld. Follow Phil on Twitter at @itwphiljohnson. For the latest IT news, analysis and how-tos, follow ITworld on Twitter and Facebook.