Don't have enough sources for breaking news? Try Wikipedia

The Wikipedia Live Monitor can identify breaking news stories almost as quickly - and more reliably - than social networks

As I write this post, it has been less than 24 hours since the bombings at the Boston Marathon, and almost just as long since the Wikipedia page about the bombings was created. While many people, no doubt, first heard about the tragedy via social media networks like Twitter and Facebook, the news made it to Wikipedia almost as fast. This raises the question of whether Wikipedia itself can be used to identify breaking news stories.

The Wikipedia Live Monitor in action

That’s a question that three researchers have recently attempted to answer. Thomas Steiner, from Google Germany, Seth van Hooland from the Universite Libre de Bruxelles and Ed Summers from Library of Congress found that spikes in real-time edits to Wikipedia articles can be used to identify breaking news almost as quickly as social networks. While breaking news tends to hit social networks first, the authors found that the lag time between the news being mentioned on those networks and Wikipedia being updated was much shorter than previously speculated, in some cases only minutes.

Specifically, they created the Wikipedia Live Monitor, an open source tool based partially on Wikistream, which monitors live updates to Wikipedia articles. The Wikipedia Live Monitor watches for changes via IRC to Wikipedia articles in any of 42 languages (those with over 100,000 articles). Breaking news candidates are identified as article clusters (that is, multiple articles in different languages about the same topic) that have had frequent, recent edits by multiple editors. Breaking news candidates were then validated (or rejected) as such by manually searching Twitter, Facebook and Google+ for mentions of the event in question.

They found that monitoring Wikipedia edits was a reliable way to identify breaking news, with a lag time of approximately 30 minutes between the news being mentioned on social media networks and the Wikipedia being updated. However, they found that the lag time could be much shorter for “global breaking news like celebrity deaths.” For example, in the case of Pope Benedict XVI resigning, they found that their system would have identified the news based on Wikipedia edits only two minutes after Reuters broke the news on Twitter.

Interesting stuff, but is there a practical application for this knowledge? Well, the authors found that looking to Wikipedia first, then validating against social networks (as opposed to looking first to social networks for breaking news) resulted in fewer false positives, but just as many true positives. So maybe the upshot is that, using a tool like the Live Monitor, we can look to Wikipedia as a more reliable source of breaking news.

While I don’t think this will make me start using Wikipedia as a breaking news source, I must say that watching live updates to Wikipedia via the Live Monitor is kind of mesmerizing. If nothing else, this research underlines the power of crowdsourced information.

Read more of Phil Johnson's #Tech blog and follow the latest IT news at ITworld. Follow Phil on Twitter at @itwphiljohnson. For the latest IT news, analysis and how-tos, follow ITworld on Twitter and Facebook.

Insider: How the basic tech behind the Internet works
Join the discussion
Be the first to comment on this article. Our Commenting Policies