The TAO API (application programming interface) "makes the entire data store feel like one unified system, while on the back end, we are able to distribute it across a wide number of machines, data centers and even regions," Venkataramani said.
TAO has been in full-scale deployment at Facebook for about two years. During peak hours, TAO can process more than 1.6 billion reads per second and 3 million writes per second.
Initiated in 2007, TAO started as a project to build an API that would provide an easy way for Facebook and third-party developers to build new services based on user data. The API offered data on the graph data model, which categorized all information as either objects or associations. An object could be a user or a specific post, and an association could be a pre-defined relationship between two nodes, such as a user "liking" a post. Each node or association can originate from any Facebook server around the world.
The Objects and Associations API paved the way for a number of very successful Facebook features, such as likes and events. But it also placed a heavy burden on the servers and software in the way that it requested data. So in 2009, Facebook engineers started work on developing a distributed service based on objects and associations that would be better suited for serving information in graph data structures.
Originally, Facebook user data was stored on MySQL, queried through PHP, and cached for quick accessibility on Memcache. Over time, the immense amount of data Facebook captured required the company to divide the database into hundreds of thousands of logical shards, with each shard holding a unique portion of data.
MySQL, which Facebook now views as a component of TAO, provides only persistent, or long-term, storage of data. Most of the information that users see is assembled from TAO's globally distributed in-memory cache, which is automatically populated with data as it is requested and submitted by users, while bumping out the least recently used (LRU) data. Only requests for older, rarely consulted data reach back to the MySQL databases.
The company no longer uses Memcache for caching duties (though Facebook continues to use the software in other systems).
Technically speaking, Memcache is closer to an in-memory data store rather than a caching mechanism, Venkataramani explained. As a result, the software didn't handle typical caching duties such as automatically maintaining consistency with the source database, or automatically drawing data from a database that has been requested by users. As a result, Facebook engineers had to write code to enable these features piecemeal, which complicated the overall architecture.