From: www.itworld.com

Souped-up Search

by Mathew Schwartz

April 12, 2001 —

 

It's Feb. 1, the score is tied, 83-83, and under a deafening roar, University of North Carolina (UNC) basketball center Brendan Haywood steps to the free-throw line with only 1.2 seconds remaining in a game on archrival Duke University's home court. Haywood sinks both free throws, and after Duke misses a last-second basket attempt, UNC wins. For those teams' rabid fans, it's the biggest upset in the history of the heated matchup since -- well, since when? To find the answer, fans might take to searching the Web for statistics on the longtime rivalry.

Until recently, they wouldn't have found answers easily; searching for statistics on the Web has been futile. Though the information is somewhere out there, it's probably on multiple, unconnected pages. No search engine has been able to stitch the relevant statistics together.

But new technology is beginning to change the rules for searches. For the first time, users schooled in the intricacies of the Boolean search language's ands and nots are using natural language. They're also relying upon tools that can recognize images, search statistical databases and extract relevant information from unconnected sources. These features are appearing not only in online search engines such as Excite@Home or Seattle-based sports site ESPN.com, but also in venerable enterprise content-management software, for searching corporate documents and knowledge bases.

Point technology improvements are steps toward a greater goal. Someday, a search engine will be able to intelligently extract context from any question, find the information it needs from various sources and then present it in a usable format. Susan Feldman, an analyst at IDC in Framingham, Mass., calls this state of search nirvana "the answer machine" -- ask anything on a search engine, and it finds the answer. Until that halcyon day, there are some new point technologies that could make corporate and Web searching slightly easier.

Fun With Numbers

Given the importance of statistics in sports, it's no wonder that ESPN.com, which receives about 2.5 million page views per day, wanted a statistics-friendly search engine for its site. For years, the company had been asking, "How do we get into a less-structured query environment?" says Geoff Reiss, senior vice president of programming, production and operations at ESPN Internet Group, which is part of ABC Inc. in New York.

The problem is that most search engines can't analyze charts and databases; they can only note frequency of words. But word groups don't necessarily add up to real concepts.

ESPN began using services from Fact City Inc. in Waltham, Mass., last year to enable searches for statistics from professional sports leagues or college athletics.

"What we can do now is create comparisons and [provide] context," says Reiss. The ultimate goal, he adds, is to replicate the moment "when a baseball fan picks up the baseball encyclopedia and gets lost in the serendipity of it." Although that isn't yet a reality, during last year's NCAA college basketball tournament, some ESPN users literally spent hours searching through the Web site's collections of statistics.

Fact City functions like an application service provider. First, it writes a data dictionary that explains the relationships between things in a given database. So if a user requests "Walter Payton career yards," the search engine knows to reference football statistics and display the former Chicago Bears running back's passing, receiving and rushing yards.

Fact City receives search requests from client Web sites and then sends the answers back via Web protocols HTTP and XML so client sites can display results via their own Web pages and maintain site design. Fact City works with free databases such as the CIA's World Factbook; proprietary corporate databases for in-house searches; and licensed proprietary databases, like that of Zagat Survey LLC in New York for restaurant reviews.

Users of Web search engines, corporate knowledge tools or even local library catalog computers know that every search engine seems to use different input rules. Some tools require the word not to exclude certain results from a query, while others require a minus sign. The whole approach is flawed. Search engines "give you what you ask for, but the critical problem is most people don't know how to write the right question," says Feldman.

The solution is natural language processing, which can interpret user requests by comparing them against dictionaries of definitions and concepts, thus eliminating the need for special query terms. That way, users searching for information about "high blood pressure" would benefit from medical literature that instead classify it as "hypertension."

The goal, of course, is to be able to enter into a search engine a request such as: "Discern the recent hiring habits of my five biggest competitors."

Though that ability doesn't yet exist, the necessary pieces are beginning to appear. For instance, new software from iPhrase Technologies Inc. in Cambridge, Mass., lets companies search through their structured information (databases) as well as their unstructured information (documents and Web pages).

"It uses a natural language search to find the most appropriate page within a site, or if they can't find it, they fabricate a page from site information," notes Guy Creese, an analyst at The Yankee Group in Boston.

At The Charles Schwab Corp. in San Francisco, Bob Sofman, senior vice president of the firm's electronic brokerage, says the iPhrase engine running on Schwab.com is often used to quickly compare the market capitalization, price/earnings (p/e) ratio and revenues for different companies. For instance, the engine can respond correctly to queries such as "find the p/e ratios of the top five trading stocks today" and return all relevant information, report-style, on one page.

Release the Hounds

While search engines are getting better at indexing text and documents with ease, when it comes to searching images and video, most engines are stuck searching the text that describes the image. This lackluster, secondhand information is often absent or vague.

However, new search engine software, such as that from Ereo Inc. in Westminster, Colo., is beginning to analyze the actual images on a Web page in order to deduce what's in them. The technology can even be used for blocking or finding nude images on the Web.

Search portal Excite@Home, also known as At Home Corp. in Redwood City, Calif., is using Ereo's software to let users search the Web for images.

Another early user is Minden Pictures Inc., an Aptos, Calif.-based stock photography agency with more than 250,000 wildlife images from a select group of National Geographic photojournalists.

The company's president, Larry Minden, says he's using Ereo to let people who search his Web site narrow their searches visually, rather than textually. When users find an image, they can click "more like this," so even if they don't know what they're looking at -- a Bengal tiger, as opposed to a Siberian, for instance -- they'll get the results they're looking for.

Instead of requiring that every image be named, Ereo can actually analyze images to understand what they are. In other words, with some tweaking, not only can it differentiate between breeds of dogs, but it can aalso tell if a dog is running or sitting.

"There's a default weighting of different criteria, but if you have different needs or interests, you can further weight the color or hue, as opposed to just the meta tags," says Minden. Other adjustable search criteria include shapes, textures and backgrounds.

No search tool is perfect, but improved searches can improve revenue. For instance, image requests for "horse" can return many pages of results. Minden says customers typically will look only at a handful of results pages and then either phone his company or go elsewhere. So "if you can find a way that's going to bring what you really need up to the top five pages [of results] from 500 pages, you're going to have much better customer satisfaction," says Minden.

Future Search

New technologies are going even further than image searching. Los Angeles-based Oingo Inc.'s search site gives users drop-down boxes to select what they really mean to say. "It's very simple to go back to a user and ask, 'Do you want a 'jaguar', the animal, or a 'Jag' car?' " says Feldman.

Concept mapping, which is just beginning to emerge, can understand concepts -- that pool hall also means billiards but not in-ground pool -- to produce better results.

For instance, tools from firms such as ClearForest Corp. in New York and Solutions-United Inc. in Syracuse, N.Y., can analyze text and decode context, such as whether an e-mail has an irate tone, to determine a customer's frustration levels. It can also tell whether someone is searching for a printer for their children or for their home office.

Xerox Palo Alto Research Center spin-off Inxight Software Inc. in Santa Clara, Calif., just launched Categorizer, a program that can assign corporate documents to predefined subject categories to produce a Yahoo-like hierarchy. That's useful for an analyst who only needs a subset of 200 documents but would otherwise have to read them all. Documents aside, someday, analysts will be able to just read -- and trust -- the summary.











How to Protect Your Staff

Layoffs, budget cuts and leaner times can wreak havoc on your IT department. The following are some ways to protect your staff during tough economic times:

















Be upfront and honest about possible layoffs. Keep your staff informed about impending layoffs so they have time to prepare both emotionally and professionally. Open communication is crucial to maintaining morale.

Align your department with your organization’s strategic plan. Make sure you and your team are working toward your company’s larger strategic goals, rather than going off on another path.

Make training a priority. To keep employees happy and motivated, training is essential even when budgets are tight.

Don’t overstaff. If you have certain projects that require specific skills, consider hiring contractors who can fill in when workloads increase. If you hire full-time employees during the boom times, you’ll probably end up having to lay off more people when a slowdown occurs.

Don’t panic. Sure, the economy appears to be slowing down, but you should stay calm. Most IT departments are the last to feel the effects of major corporate cost-cutting efforts. Just be sure you can justify spending and expenses if you are asked.














The Quest for Easier Answers Continues

New search engines are changing the rules for searches of Web sites and corporate databases so users can more easily obtain useful information.


































Company

Killer Features(s)

ClearForest

Context Analysis

Ereo

Pixel-level image analysis for recognition

Fact City

Searching statistical information

iPhrase

Natural language; stitching multiple results together

Inxight

Categorizer files docments in hierarchy

Oingo

Drop-down boxes for context

Solution-United

Context analysis