ACM CHI: more search could be crowdsourced

Search engines could expand the range of answers they provide through simple filtering and crowdsourcing collaboration

Search engines could use crowdsourcing to expand the range of answers they give to their users, a group of researchers from the Microsoft and the Massachusetts Institute of Technology have concluded.

Today, Web search engines primarily use computer-run page ranking algorithms to generate results for user submitted queries. For a small number of simple queries however, services return the exact answer the user is seeking. Google, for instance, could return a the local show times for a movie, the weather for a certain region, or the results of a simple math problem.

This range of answers could be radically expanded through some data mining techniques and crowdsourced editing, according to M.I.T. researcher Michael Bernstein, who summarized the group's work at the Association for Computing Machinery's Conference on Human Factors in Computing Systems, being held this week in Austin, Texas.

In a trial survey with 361 participants, the researchers found that search engines, by providing more direct answers to queries, could significantly improve their users' perceptions of search quality, especially for those queries that did not return many relevant pages. "Our findings suggest that search engines can be extended to directly respond to a large new class of queries," stated the paper describing the work, entitled "Direct Answers for Search Queries in the Long Tail.".

The range of answers search engines could provide could be radically expanded with a relative minimal additional cost, the researchers argued. The key would be to harness the power of crowdsourcing, or contracting people to identify the answers to simple but frequently asked questions.

Today, search engines will only provide direct answers to a small subset of queries, namely those that get asked often. In these cases, the search engine provides the actual answer to the question, rather than just a link to where the answer could be found. With such popular questions, search engine companies find it worthwhile to devote engineers to manually craft program code to identify each question, and then find and supply the answers. "These kinds of answers are only available to popular queries, because search engines have to put a lot of effort into them," Bernstein said.

The number of direct answers provided to users could be expanded, at minimal cost, the researchers argue. About 50 percent of the queries that search engines get are completely novel, Bernstein said. But the rest are questions that are repeatedly asked. At least some of these queries have answers that can easily be generated, and checked through some simple crowd-sourcing.

"We are focusing on a set of queries that are somewhat popular," Bernstein said. "We can create thousands of these answers." In the future, a search service could provide direct answers to many additional questions, such as how to shut down a stalled Apple Mac computer, what the average body temperature is for a dog, how to bake a potato, or how to play the Rummy 500 card game.

In a trial experiment, the researchers had data mining software comb through 75 million search queries from Microsoft's Bing search engine, looking for those queries that resulted in a click through to a single site. They then identified those queries that could be succinctly answered and contracted workers to quickly craft simple answers and proofread the work. They found these workers through Amazon's Mechanical Turk, by way of a third party service called Crowdflower.

By automating as much of the process of creating the content as possible, search engines can keep their costs minimal. Search engines could contract out the manual labor on a piecemeal basis, using services such as Amazon's Mechanical Turk. The researchers identified about 20,000 queries that could be easily provided with answers. They estimated it would cost search engines about .44 cents to provide a simple answer for each query.

Bernstein admitted that this approach, should it be used, would raise a number of issues. For one, search engines would have to filter out incorrect information somehow. Also, search engines would risk the ire of Web site owners, who would complain that the answers deprives them of Internet traffic, because the search engine itself is providing the answer. "We have to ask ourself whether we are going too far," he said.

