From: www.itworld.com
May 12, 2008 —
The public will get its first chance Monday to test a search engine from start-up
Powerset that eschews
conventional keyword technology and instead is designed to understand the meaning
of Web pages.
As such, Powerset's search engine holds the promise of fundamentally changing
people's expectations for search engines by, in theory, offering a smarter,
more efficient experience.
However, Powerset's beta version, while delivering impressive results, has
a limited scope and index, leaving unanswered questions about its ability to
work its magic at the massive scale of Google's keyword-based search engine.
"We're changing the way information is searched by doing a much deeper
analysis of the pages we index," said Scott Prevost, Powerset's product
director.
Keyword engines treat pages as word bags, indexing their content without grasping
its meaning, he said. Meanwhile, Powerset's engine, applying technology developed
in-house as well as licensed from Xerox's PARC subsidiary, creates a semantic
representation by parsing each sentence and extracting its meaning. "Meaning
is what we index," he said.
In an interview in October with IDG News Service, Marissa Mayer, Google's vice
president of Search Products & User Experience, acknowledged that the company's
search engine should -- and will -- overcome its keyword dependence in time.
"People should be able to ask questions and we should understand their
meaning, or they should be able to talk about things at a conceptual level.
We see a lot of concept-based questions -- not about what words will appear
on the page but more like 'what is this about?'. A lot of people will turn to
things like the semantic Web as a possible answer to that," she said.
But she added that Google's search engine acts smart thanks to the humongous
amount of data it crunches. "With a lot of data, you ultimately see things
that seem intelligent even though they're done through brute force," she
said. As examples, she cited a query like "GM," which the engine interprets
as "General Motors" but if the query is "GM foods," it delivers
results for "genetically-modified foods." "Because we're processing
so much data, we have a lot of context around things like acronyms. Suddenly,
the search engine seems smart, like it achieved that semantic understanding,
but it hasn't really," she said.
For now, Powerset's index is very limited, consisting only of millions of pages
from Wikipedia and Metaweb
Technologies' Freebase,
a Web-based structured database of information. However, Prevost vows that the
index will begin growing within a month after its launch and eventually rival
in size those of Google,
Yahoo and others. "Our
technology fully scales," he said.
Still, it's impressive to see Powerset's search engine in action and the promise
it holds. Instead of returning the proverbial 10 blue links for search results,
Powerset can do more, such as assembling a collection of facts related to the
query, as well as summarize the found information. It can also provide direct
answers to factual questions.
Because the content from Wikipedia and Freebase can be re-published, Powerset
can remain relevant after a user clicks on over to a search result, by providing
an outline to navigate through the page and a summary of facts. This, of course,
isn't something that Powerset could do with copyrighted content, but the company
will seek partnerships with publishers to obtain permission, Prevost said. "We
think it'll be a situation where publishers will want their content to be served
up in this way," he said.
Industry analyst Greg
Sterling of Sterling Market Intelligence calls Powerset's capabilities "impressive"
and particularly likes its search results interface. "What they've created
is both a better search engine for Wikipedia and a massive 'proof of concept'
for their algorithm and technology," he said in an e-mail interview.
Now Powerset has to prove that its search engine can scale and deliver against
an index of billions upon billions of Web pages and serving millions of concurrent
end users. "There's certainly potential there to build a better mousetrap,
it would appear. But bringing what Powerset has done for Wikipedia to the entire
Internet seems an enormous challenge that will take both time and lots of additional
resources," Sterling said.
Prevost acknowledges that to do this type of deep processing takes a lot of
computational power, although once indexed, retrieving pages' information doesn't
pose any special challenge.
Powerset also faces the challenges of a start-up technology company, such as
generating revenue and going through growing pains. The company has already
had some management upheaval, announcing in November the departure of co-founder
and Chief Operating Officer Steve Newcomb and its search for a CEO, as co-founder
Barney Pell gave up that post to become chief technology officer. "The
CEO search is still in process, but we have a strong internal management structure
and board of directors," he said.
Prevost said the company's investors are committed to the company and to seeing
that it has the resources necessary to scale up the search engine to the level
of those with indexes of 20 billion pages.
Powerset's business model is based on advertising, although the search engine
will not serve up ads from the beginning. "There's a lot of cool stuff
we can do in the ad space by matching the meaning of queries to the relevance
of ads, but that's much more longer term," he said.
The search engine will be limited to Web search at first, although Powerset
has contemplated adding specialty engines for things like images and video later,
as well as targeting verticals such as health, product reviews and travel, he
said.
"We've only shown the tip of the iceberg in language analysis," he
said.
IDG News Service