Enterprise Search: A different ball game for Google

By Trevor Clarke, Computerworld Australia |  Internet, enterprise search, Google Add a new comment

Google's Global Product Manager for the Google Search Appliance, Cyrus Mistry, spoke with Computerworld Australia editor, Trevor Clarke, about enterprise search and why it is a different game to Web search

What is enterprise search and how does it differ from Web search?

Cyrus Mistry (CM): This is exactly why enterprise search was created because there are differences. I would love to say Google has been brilliant and we were just going to create this enterprise search product, but it didn't work that way. We got the question probably 150 times before we did it from CEOs and CIOs asking, can't they just have Google for their company. You've probably heard people say, 'Why can't we just have Google?'. We of course looked and said we only have Web search and there are differences.

So first of all, search algorithms, meaning ranking functions are going to be different. In the Web, you put out a fantastic article and 40,000 people blog about it. That is going to really help your article and relevance. But within a company I am guessing you don’t have 40,000 internal wikis and blogs pointing to that document. So rankings have to be optimised differently, that is the first thing.

The second thing is, we couldn't answer this call of enterprise search until we addressed the security issue. So you get access to these 50 documents but the CEO gets access to pay roll information as well. So making sure that when you do a search you see everything you are authorised to see, but others only see what they are authorised to see. So we had to kind of make sure we had that airtight security. Then finally we had to address the big question, which they had on 'Can't we just have Google?'. What did they mean by that? When we asked, they generally mean they wanted the same easy to use interface. And number two they wanted the results really fast. That required some work. One of the reasons Web is so fast is because we can massively parralise the work across hundreds of thousands of servers at Google. You can't do that necessarily at Computerworld or TV New Zealand or whoever.

If all the content to be indexed was publicly available, what difference would there be between using a GSA and 'site:' search?

There is a big difference. Many people don't even know the 'site:' thing – I would say maybe one in probably 10,000 people know what it is. The problem with 'site:' is a couple of things. The first one is a lot of people have publicly available content and it's all public. If you have any kind of extranet content, a partner that logs in to see information, or maybe visa.com or discover.com and people can go and see their information, that would not be searchable because we couldn't get to it. Number two, you are at the mercy of Google. If they feel like crawling you they will. If you are CNN.com we are going to crawl you pretty frequently.

Number three, we may or may not get every one of your pages indexed. We did release a hosted search solution for enterprise called Google Site Search, which is purely hosted. It is different from 'site:' because we actually create a special index for you to make sure we crawl it all. Also, we have a crawl frequency guarantee that we will crawl your content within 24 hours if you tell us to crawl it. We can give you more guarantees and the US price starts at about $100 a year. So that is for smaller websites. For the larger ones, for example we have UK Parliament, it's not at the $100 range, but if you go to UK Parliament and you do a search or you go to eHealth and do a search they are all powered by the Google Site Search solution.

How do you count documents for the pricing information?

The number of pages created in a document doesn’t count against it. In other words you can have 20,000 pages – that would count as one document. So a document is a unique file or you can think about it as a URL. We will index up to 30Mb files – post that it is questionable because it is a really large document. We also have a way of doing things outside of this license count, outside of the per-document count. You can actually pull real time information and there is no extra cost for that. So in other words if you want to pull information from a database, for instance in media lets say you have a huge database of all the advertisers who place orders for quarter, full-page ads, whatever. So you can actually search that database and pull back the live data right on top of the search results. It’s kind of like on the web when you see the live real answers. You are not just finding documents at a company. You could be finding experts, you could be finding employees, you could be finding orders; these are all live data, it is not a document. So those don’t count towards your licensing.

    Add a comment

    Post a comment using one of these accounts
    Or join now
    At least 6 characters

    Note: Comment will appear soon after you have activated your account.
    Obscene/spam comments will be removed and accounts suspended.
    The information you submit is subject to our Privacy Policy and Terms of Service.

    ITworld LIVE

    InternetWhite Papers & Webcasts

    White Paper

    Smarter Commerce is redefining value chain visibility

    Smarter Commerce is redefining the value chain in the age of the customer. It starts with putting the customer at the center of your operations - which of itself is not a new idea - however, truly operationalizing this strategy is not easy.

    White Paper

    IBM Synchronizes its Commerce 2.0 Strategy with 'Smarter Commerce' Initiative

    On March 14, IBM announced "Smarter Commerce", a strategic initiative that addresses the surging market for Commerce 2.0 solutions that take advantage of the convergence of a number of disruptive software and hardware technologies.

    See more White Papers | Webcasts

    Ask a question

    Ask a Question