Enterprise Search: A different ball game for Google
Google's Global Product Manager for the Google Search Appliance, Cyrus Mistry, spoke with Computerworld Australia editor, Trevor Clarke, about enterprise search and why it is a different game to Web search
What is enterprise search and how does it differ from Web search?
Cyrus Mistry (CM): This is exactly why enterprise search was created because there are differences. I would love to say Google has been brilliant and we were just going to create this enterprise search product, but it didn't work that way. We got the question probably 150 times before we did it from CEOs and CIOs asking, can't they just have Google for their company. You've probably heard people say, 'Why can't we just have Google?'. We of course looked and said we only have Web search and there are differences.
So first of all, search algorithms, meaning ranking functions are going to be different. In the Web, you put out a fantastic article and 40,000 people blog about it. That is going to really help your article and relevance. But within a company I am guessing you don’t have 40,000 internal wikis and blogs pointing to that document. So rankings have to be optimised differently, that is the first thing.
The second thing is, we couldn't answer this call of enterprise search until we addressed the security issue. So you get access to these 50 documents but the CEO gets access to pay roll information as well. So making sure that when you do a search you see everything you are authorised to see, but others only see what they are authorised to see. So we had to kind of make sure we had that airtight security. Then finally we had to address the big question, which they had on 'Can't we just have Google?'. What did they mean by that? When we asked, they generally mean they wanted the same easy to use interface. And number two they wanted the results really fast. That required some work. One of the reasons Web is so fast is because we can massively parralise the work across hundreds of thousands of servers at Google. You can't do that necessarily at Computerworld or TV New Zealand or whoever.
If all the content to be indexed was publicly available, what difference would there be between using a GSA and 'site:' search?
There is a big difference. Many people don't even know the 'site:' thing – I would say maybe one in probably 10,000 people know what it is. The problem with 'site:' is a couple of things. The first one is a lot of people have publicly available content and it's all public. If you have any kind of extranet content, a partner that logs in to see information, or maybe visa.com or discover.com and people can go and see their information, that would not be searchable because we couldn't get to it. Number two, you are at the mercy of Google. If they feel like crawling you they will. If you are CNN.com we are going to crawl you pretty frequently.
Number three, we may or may not get every one of your pages indexed. We did release a hosted search solution for enterprise called Google Site Search, which is purely hosted. It is different from 'site:' because we actually create a special index for you to make sure we crawl it all. Also, we have a crawl frequency guarantee that we will crawl your content within 24 hours if you tell us to crawl it. We can give you more guarantees and the US price starts at about $100 a year. So that is for smaller websites. For the larger ones, for example we have UK Parliament, it's not at the $100 range, but if you go to UK Parliament and you do a search or you go to eHealth and do a search they are all powered by the Google Site Search solution.
How do you count documents for the pricing information?
The number of pages created in a document doesn’t count against it. In other words you can have 20,000 pages – that would count as one document. So a document is a unique file or you can think about it as a URL. We will index up to 30Mb files – post that it is questionable because it is a really large document. We also have a way of doing things outside of this license count, outside of the per-document count. You can actually pull real time information and there is no extra cost for that. So in other words if you want to pull information from a database, for instance in media lets say you have a huge database of all the advertisers who place orders for quarter, full-page ads, whatever. So you can actually search that database and pull back the live data right on top of the search results. It’s kind of like on the web when you see the live real answers. You are not just finding documents at a company. You could be finding experts, you could be finding employees, you could be finding orders; these are all live data, it is not a document. So those don’t count towards your licensing.
What is the biggest challenge you face in convincing IT managers to purchase enterprise search solutions? Is part of it convincing people Google is a business option and not just web search for consumers?
There are two challenges with IT. One of the problems is, unfortunately or fortunately, 80 per cent of IT spend is, according to Gartner dead money. Meaning it is money spent just maintaining their stuff. This is a whole new way of thinking and so you are exactly right, one of the things is we have to convince them they don’t need to do a traditional enterprise software deployment, which is a complete and total nightmare in every way you can imagine. You are talking about multiple, multiple months and consultants on site – nobody actually wants to do that. Convincing them that yes, it looks really simple and it is truly a box with a power cord and Ethernet cord, but all of the smarts are there for you, the hard work has been done.
That is the first one, the second one is yes, this whole ‘is Google serious about the enterprise business,’ or are they just Web keyword search. I can tell you the main reason we launched Side by Side was to kind of put all that to rest once and for all. So anyone that might have that idea, just do it and they will find out for themselves which results are better and there is a huge margin of difference. And finally, the big thing we have to convince people, so I guess there are three things, is the ROI on doing this. Even though Google is the lesser expensive on sending this out we still need to make sure business or IT go out and do a survey to find out how much time people are looking for information and what would the return be.
Who do you see, particularly in the APAC region as your biggest competitors and why?
I never like to say disparaging things about competitors – they are all good and bring different things to the table. Actually the largest competitor, strangely and it is by data and not just what we think, is no search. I’m not trying to get around the answer but 75 per cent of companies do not have an enterprise search solution in place. So it is really versus us worrying about what others are doing, we do of course know what they are doing, but there is a huge Greenfield, a huge white space, so we like to focus on innovation – getting our engineers doing great stuff and our users great products. Absolutely all vendors have something to offer in that space.
Is there anything you see other vendors doing that you think Google should also be doing?
One of the difficulties, and this is going to sound mean, is getting some of these solutions to look at because one of the problems with these solutions is a lot of them are monolithic major enterprise software deployments. So what you have to do is talk to customers that have them. Usually when we are talking to them they are switching from another one to us. I don’t have any specifics on things we would like. One of the big innovation engines we look at to come up with ideas is truly on the consumer side. We have more search engineers than any other search company has total employees. Because of that our rate of innovation tends to outpace that of the other players. I don’t want to say it is more of the opposite but it probably is.
What will be the top three improvements in the next generation of enterprise search from Google’s perspective?
I would say the top three improvements are number one, you are going to see some really neat things coming around search quality. When I mention search quality that is the more academic name for the field of what you actually return to the user. It is not, I think, enough to return a whole long list of documents. You are going to start seeing some really creative ways to generate an answer. I personally have been championing, internally and externally, that the ultimate goal of a search engine is always to give you an answer, it is never to give you a long list of documents. The opponents will always say, what if I just want to sort through, say, 50 documents. I don’t think you ever want to sort through 50 documents. I am guessing you want to synthesis, maybe can you tell me what all these documents are saying about a topic and it should provide you with that synthesised information. So the first area of innovation will be around search quality and all the natural language processing and the work that is coming.
The second piece is more around scale. One of the things I haven’t talked about is our 6.0 release and how you can add more appliances and scale. We actually have the ability to scale to a billion documents for instance. Finally you will see a lot of innovation in the area of reach, which is this concept of we want to continue to get to more and more different content systems, especially in the next 12 months. We can index SharePoint documents and FileNet, Livelink, etcetera. But now we want to get to those long tail, really complicated systems out of the box. We can do them but there is custom work involved to get to those harder systems so we want to simplify it.
And for the developer community – can they expect any enhancements in the near future?
Yes, we do quite a bit for the developer community. If you go to code.google.com you can do a search for enterprise search development stuff. The second area that they do a lot of development is if you do a search for the One Box Gallery, you can see what our partners have done and they are creating these really neat OneBoxes. One of the benefits of Google having this really large footprint of customers is that the tables are turned and now these guys are wanting to say, here, this is how they should integrate with us. Once you get that critical mass that tends to happen. Finally on the enterprise labs I’d personally like to release some new things for the developer community as open source every two to three months. If you go to Enterprise Labs as the third place you can see stuff for developers there.
Would it be fair to say that a good deal of the potential for your enterprise search success is reliant upon having a big community of ISVs, SIs and open source developers?
Partners are absolutely paramount to our success. We have a great partner organisation in the Australia and New Zealand region. In fact, all of our business goes through partners over here. Because of that, not only are they critical to our success, they are our success if that makes sense. They are excited, they are fired up. There are a few, in fact, that have gone on exclusively with Google, meaning they don’t sell or deploy anything else. And that is what we love to hear – this is now enough of our business that we can just shut down all other operations. So I would absolutely agree, both on the reseller side and on the open source side.
Got a news tip? Email Computerworld or follow @computerworldau on Twitter and let us know.