Banjo plays a big data tune
Billions of records processed in real time
The proliferation of social networks has created a whole host of problems for both end users and IT staffers--problems that one new service is using big data technology to solve.
The story of Banjo seems almost apocryphal in nature: founder and CEO Damien Patton was in Boston's Logan Airport waiting for an outbound flight. A friend Patton hadn't seen in years was waiting for a different flight just one gate over. Both men were actively posting their locations on social networks, but since they were on different social networks, they were oblivious to each other's presence.
This is where the idea for Banjo originated: a service with the capability to actively monitor different social networks in real time and coordinate location-based activity so that Banjo will let you know if an acquaintance is nearby--regardless of the network they're on.
A location-based aggregation service like Banjo is surprisingly useful, even the more so with its search capabilities that can let you search a given location and see what's going on there. I've poked around with it and found it very useful when recently attending a conference in San Francisco.
To make all this work, though, Banjo has some serious data obstacles to surmount. Not only must billions of updates be monitored and tracked for location-specific information, that data must then be matched with the right Banjo users who are authorized to receive pushed notifications that they have a friend nearby.
Patton emphasized that this is because Banjo takes its privacy obligations very seriously, delivering content only to its intended audience.
Getting this all to work with latencies of around 200 ms is a big challenge. Initially Banjo's engineers were using an application on Amazon's EC2 network for scalability and power, but that quickly proved cumbersome. Ultimately, when Fredrik Björk became Director of Engineering for the startup, the company opted to switch out to the Heroku hosted services, which enabled the Banjo app to run within a full Ruby on Rails stack, Patton said.
Heroku also gave the company the advantage of not having to work with sysadmin issues and--since Banjo then opted to use Heroku's hosted MongoDB service as its database, they were able to sidestep DBA overhead as well.
To get the real-time speeds Banjo requires, Patton added, over 300 GB of data is processed in MongoDB's memory at any given time. They need it, too: depending on the time of day, traffic can hit nearly 2,000 requests per minute.
If you think they're missing, don't worry: Hadoop and Hive are used as part of their solution too, to run batched analysis of trends and metrics.
Talking to Patton, it's a little hard to tell if Banjo is company that was using a business model to tackle a big data problem or using big data to solve a business strategy.
Either way, one thing is clear: Patton doesn't seem to obsess about the infrastructure needs of his company's services. For Patton, the focus is on delivering the right services for his customers.