February 28, 2012, 11:56 AM —
The secret origin of big data has its roots in the massive firehouse of data born from the Web and the role it plays in ecommerce.
So it's only fitting that companies like BloomReach are using big-data analytical techniques to turn the delivery model of content to consumer on its metaphorical head with its new Web Relevance Engine product stack.
BloomReach is a new company launched this week around the Strata Conference in the hopes of catching the wave of new big data ventures coming out of this event. What makes BloomReach interesting, though, its the decidedly narrow focus of how it applies analytics to the web.
Web analytics is a tricky game--in the past, it was very much reactive. Analysts would monitor the logs of web sites and use them to determine (in decidedly non real time) which pages or products were doing the best and then modify the web site to drive traffic in to that content. As data analytics got better, then actual real time started to be approached. So, sites like Amazon can measure page views and customer interest and get hot products more noticed more quickly.
But even as this approach grows more sophisticated, one thing doesn't change much, if at all: the content itself. While the placement of content can be different and adjusted, the basic core content is not modified based on its true relevancy to the customer.
This is the gap that BloomReach is trying to address: monitoring consumer experience and web site data closely, and then modifying content on-the-fly to make that content more accessible for the end user. This approach is, as CEO Raj De Datta puts it, "making sure every page is as relevant as possible."
De Datta explained that the content isn't completely altered--you won't see a page completely customized for the end user. What happens is that when a customer comes to a site that's connected to BloomReach's hosted service, BloomReach will use the data it picks up about the customer and her habits, as well as past behavior on the site to adjust the searches the user makes to come up with better results.
The example De Datta gave was a simple one: say you have an ecommerce site with one red sweater and one striped sweater. If a user makes a search on either one of the major portals (Google, Bing, Yahoo) or on the ecommerce site itself for ("red striped sweater"), that search result may not show up under normal circumstances. That's because it's nearly impossible for a human being to go through every possible semantic combination of products available.
But not for the Web Relevance Engine. By gathering all of this data, the BloomReach software can figure out semantic combinations such as this and actually deliver the relevant page for the customer's search.
"Most web sites see 25 percent of their pages found through natural search," De Datta said. "After a few months with BloomReach, our customers are seeing 75 percent of their pages exposed."
The Web Relevance Engine stack is actually three products: the BloomSearch product that performs the functions described above; the BloomLift application, which plugs into ad campaigns to deliver more relevant ad pages; and the BloomSocial software, which ties in social curation for delivering relevant content.
This has not been a quick slap-dash project: according to De Datta and his team, the company has been working three years to come up with this solution, which uses Hadoop stores and Cassandra and custom query language to drive the products, explained CTO Ashutosh Garg.
"We are running 1,000 Hadoop jobs and hitting over a billion pages per day," Garg said. He added that the Web Relevance Engine can parse up to 10 billion phrases within its semantic map.
The length of the start-up to launch was intriguing, but it was really just showed how complex the problem BloomReach is trying to solve is.
"This is a lot of machine learning, on a massive web scale," explained Head of Marketing Joelle Kaufman. The three-year process was a necessary interval into making sure the final product could deliver on its promise.
BloomReach will be one of many vendors out there trying to apply big-data techniques to the area of web marketing and commerce. It's subscription hosted service should be an interesting one to watch.