I'm on a social network; you're on a social network, these days it seems we're all on at least one social network--like Facebook, Twitter, etc. etc--if not two, three, or even more.
To be exact, the Pew Internet & American Life Project’s December 2008 tracking survey found that 35% of adult internet users now have a profile on an online social network site. If you're a teenager, the numbers jump to 65%, but, it's the young adults who really are behind the social networks. 75% of them belong to at least one social network. Since then, the social networks have only continued to grow at an explosive rate.
How do the social networks manage millions of users and hundreds of millions of updates? The answers lie in open-source software and thousands of servers. Let's take a look behind the doors of a few top social networks -- Twitter, LinkedIn, Facebook, and MySpace -- and see exactly how they pull their tricks off.
The first thing that jumps out at you is that they're almost all based on open-source software. For example, the operating systems behind Twitter, LinkedIn, and MySpace are all Linux. Facebook uses F5 Big-IP, which is a family of Linux-based appliances that also perform network management.
The story is the same when it comes to Web servers. Apache is the Web server of choice and LinkedIn uses Sun ONE Web Servers in addition to Apache. Most of the social networks also use Sun's MySQL database management system to organize their users' messages and status updates.
The social networks’ use of open-source software doesn't mean they make their own software available to you, however. They don’t. You can't, for example, just download LinkedIn 1.2345, and run your own local copy of LinkedIn.
That's changing over time though. In October 2008, Facebook open-sourced its server and network logging system, Scribe. Thus, Scribe is now available to anyone who has to deal with the unenviable task of tracking tens of thousands of servers.
A Code of Their Own
For the most part, we know only bits and pieces of how the social networks are built on top of their LAMP (Linux, Apache, MySQL, PHP/Python/Perl) stacks. Their application programming interfaces are available, because they want developers to write applications that will run on their network.
Using a network's API, developers add functionality to the site. For example, Lil Green Patch offers a badge for Facebook and MySpace users that donates to environmental causes based on clicks.
The networks don't want to share all their secrets on how they actually work their social magic, but we do know more about some than others. For example Twitter is famous or infamous in some circles, for its use of Ruby on Rails an easy-to-use, open-source Web framework, as its foundation. To route its millions of daily messages, Twitter uses the open-source Jabber/XMPP (Extensible Messaging and Presence Protocol) instant messaging.
Of course, ”instant” is not always the word that comes to mind when it comes to Twitter's performance, as critics love to point out. But although critics have sought to blame Twitter's scaling problems on its use of Ruby on Rails, former Twitter architect Blaine Cook bluntly stated, "languages don't scale, architectures do."
Cook meant that while one language may be faster than another, that’s not important in scaling massively multiuser systems like a social network. In other words, it's not how quickly your code runs; it's how well will the entire system run when it scales from thousands of users and dozens of servers to millions of users and thousands of servers.
So, how busy are they? Facebook, which is the busiest by far according to Royal Pingdom, a Swedish site that tracks Internet sites uptime, "Facebook serves 260 billion page views per month". As they note, it's no wonder Facebook now runs 30,000 servers. What's more surprising is that it only takes 30,000 servers to do the job. Or, not, as the case may be.
When social networks can’t handle their users' demands, the price that's paid is usually stability. Twitter, with its ”Fail Whale” was down 84 hours in 2008. LinkedIn came in second in this race to the bottom with 45.8 hours of down time.
But others are doing better. Facebook, for example, only had 7.2 hours of downtime in 2008, despite a claimed server load of more than 35 million users updating their status at least once each day.
And over time, social networks are becoming more stable. Eighty-four percent of Twitter’s downtime, according to Pingdom, happened during the first half of 2008, and since then it seems to be managing better despite enormous growth.
The real scalability bottleneck for Twitter and the rest of the social networks lies in how they handle the endless database reads and writes. Putting together a system that will respond to millions of users in real time is not a trivial task.
Most, if not all, of the social networks use the open-source program Memcached to address this problem. Memcached was created by programmers at the social network LiveJournal to deal with the massive data requirements of dynamic social network applications. Since it was built from the ground up for the demands of social networking, it is now used by Facebook, Twitter, and many others.
Technically, Memcached is a generic high-performance, distributed memory object caching system. With it, instead of simply caching local chunks of data that are in high demand, the Memcached servers and clients work together to implement a global cache. The result is a greatly reduced load on the database servers, and thus faster service for visitors.
While very handy, Memcached is not a universal answer for meeting social network's database needs. Twitter, for example, can use Memcached, but because each user has a different view of all the messages, Memchached’s usefulness is limited. That's why Twitter has built Starling, a light-weight persistent queue server to handle its users' multiple Memcached requests.
Facebook faced a different challenge with Memcached as arguably the world's leading user of Memcached, Facebook was having trouble handling the sheer network traffic at the operating system level. It reworked Linux and Memcached's networking so it could handle 200,000 UDP requests per second with an average latency of 173 microseconds. The stock Linux kernel can handle only 50,000 UDP requests per second.
When it comes to hardware, social networks really get close-mouthed. Still, if you look hard enough you can get an idea of just how much high-tech equipment is required to make a major social network go.
Expensive? You betcha. A typical server for Facebook these days has two or more high-speed 64-bit Intel Nehalem processors with 4 or 8 cores.
Each of these servers has as much high-speed memory as it can handle. Typically, that's from 256GBs to 512GBs of RAM. These, in turn, are linked together into clusters.
To connect all this, social networks use equipment such as Force10 Network's E-Series switches, that can handle up to 5 TBps (Terabytes per second) on the backplane with multiple gigabit and 10-gigabit Ethernet connections.
The data itself -- your notes, photos, etc.-- are kept on a variety of different vendors' storage servers. This storage is measured, not in gigabytes, but in terabytes and petabytes.
The social networks keep their server farms in data centers co-located, when possible, with major Internet's Network Access Points (NAPs). NAPs are where the major Internet service providers connect to each other and where they provide the most direct access to the Internet's fastest connections.
The next connection for the social networks will be working out exactly how to turn all this software and hardware into dollars.
Considering the problems that designers have already overcome so that we can tweet to each other about our breakfasts; to link in to a new job; or keep up with an old classmate on Facebook, it’s likely that some will find a way.
So you want to run a social network?
OK, so Facebook spends hundreds of millions on their systems, but you're looking at the software requirements -- free and open-source -- and you're thinking to yourself, "I could probably start MyVeryOwnSocialNetwork by running it on hosted virtual servers and build from there.” And, you know what? You could.
In fact, that's exactly what Guy Kawasaki, the well-known managing director of the Garage Technology Ventures venture capital firm, did. Kawasaki didn't need any venture capital though to launch Truemors, his social/news site. He managed to launch Truemors with credit-card money: $12,107.09. The single largest item on the bill? Legal fees.
While Truemors isn't a big social-networking name, it's been a viable business now for almost three years. Not bad for an investment of just over 12 grand.
Kawasaki hired the Web developers, electric pulp, for his site. These days, however, you can do it yourself thanks to open-source social networking software.
Perhaps the best of these programs is Elgg 1.6.1. This is a standalone LAMP social networking program. It comes with user profiles, blogs, file repository, forum, social bookmarking, content syndication, and OpenID and OpenSocial Web services integration. A similar well-regarded "do-it-all" social network program, which is written in Ruby on Rails, is the oddly named Lovd by Less.
If, on the other hand, you're already using say Drupal for content management services, you can also layer social networking on top of it. A recent book, Drupal 6 Social Networking gives guidance on how to build your own social network with Drupal.
Of course, if you just want a social network that caters to your own interests, but you don't want the hassle of running your own site, there are already networks that cater to your needs. The most well-known of these is Ning.
On Ning, creating your own network is free and requires little more than having an idea and an unused name for your network. It's not as full-featured as the others, but if you just want a common, semi-private place for your friends, company, 1st grade class of 1973, etc. to hang out, you can't beat it for ease of setup.
Of course, before you do any of this, you may want to ask yourself, "Does the world need another social network?" If you still answer, "yes," than the tools are already out there and ready for you.