How Social Networking Works

Have you ever wondered how social networks manage to keep you in contact with a few hundred of your closest friends in close to real time? Here's how they do it.

By Steven J. Vaughan-Nichols, ITworld |  Software, Facebook, LinkedIn 2 comments

I'm on a social network; you're on a social network, these days it seems we're all on at least one social network--like Facebook, Twitter, etc. etc--if not two, three, or even more.

To be exact, the Pew Internet & American Life Project’s December 2008 tracking survey found that 35% of adult internet users now have a profile on an online social network site. If you're a teenager, the numbers jump to 65%, but, it's the young adults who really are behind the social networks. 75% of them belong to at least one social network. Since then, the social networks have only continued to grow at an explosive rate.

How do the social networks manage millions of users and hundreds of millions of updates? The answers lie in open-source software and thousands of servers. Let's take a look behind the doors of a few top social networks -- Twitter, LinkedIn, Facebook, and MySpace -- and see exactly how they pull their tricks off.

The first thing that jumps out at you is that they're almost all based on open-source software. For example, the operating systems behind Twitter, LinkedIn, and MySpace are all Linux. Facebook uses F5 Big-IP, which is a family of Linux-based appliances that also perform network management.

The story is the same when it comes to Web servers. Apache is the Web server of choice and LinkedIn uses Sun ONE Web Servers in addition to Apache. Most of the social networks also use Sun's MySQL database management system to organize their users' messages and status updates.

The social networks’ use of open-source software doesn't mean they make their own software available to you, however. They don’t. You can't, for example, just download LinkedIn 1.2345, and run your own local copy of LinkedIn.

That's changing over time though. In October 2008, Facebook open-sourced its server and network logging system, Scribe. Thus, Scribe is now available to anyone who has to deal with the unenviable task of tracking tens of thousands of servers.

A Code of Their Own

For the most part, we know only bits and pieces of how the social networks are built on top of their LAMP (Linux, Apache, MySQL, PHP/Python/Perl) stacks. Their application programming interfaces are available, because they want developers to write applications that will run on their network.

Using a network's API, developers add functionality to the site. For example, Lil Green Patch offers a badge for Facebook and MySpace users that donates to environmental causes based on clicks.

The networks don't want to share all their secrets on how they actually work their social magic, but we do know more about some than others. For example Twitter is famous or infamous in some circles, for its use of Ruby on Rails an easy-to-use, open-source Web framework, as its foundation. To route its millions of daily messages, Twitter uses the open-source Jabber/XMPP (Extensible Messaging and Presence Protocol) instant messaging.

Of course, ”instant” is not always the word that comes to mind when it comes to Twitter's performance, as critics love to point out. But although critics have sought to blame Twitter's scaling problems on its use of Ruby on Rails, former Twitter architect Blaine Cook bluntly stated, "languages don't scale, architectures do."

Cook meant that while one language may be faster than another, that’s not important in scaling massively multiuser systems like a social network. In other words, it's not how quickly your code runs; it's how well will the entire system run when it scales from thousands of users and dozens of servers to millions of users and thousands of servers.

So, how busy are they? Facebook, which is the busiest by far according to Royal Pingdom, a Swedish site that tracks Internet sites uptime, "Facebook serves 260 billion page views per month". As they note, it's no wonder Facebook now runs 30,000 servers. What's more surprising is that it only takes 30,000 servers to do the job. Or, not, as the case may be.

When social networks can’t handle their users' demands, the price that's paid is usually stability. Twitter, with its ”Fail Whale” was down 84 hours in 2008. LinkedIn came in second in this race to the bottom with 45.8 hours of down time.

But others are doing better. Facebook, for example, only had 7.2 hours of downtime in 2008, despite a claimed server load of more than 35 million users updating their status at least once each day.

And over time, social networks are becoming more stable. Eighty-four percent of Twitter’s downtime, according to Pingdom, happened during the first half of 2008, and since then it seems to be managing better despite enormous growth.

The real scalability bottleneck for Twitter and the rest of the social networks lies in how they handle the endless database reads and writes. Putting together a system that will respond to millions of users in real time is not a trivial task.

Most, if not all, of the social networks use the open-source program Memcached to address this problem. Memcached was created by programmers at the social network LiveJournal to deal with the massive data requirements of dynamic social network applications. Since it was built from the ground up for the demands of social networking, it is now used by Facebook, Twitter, and many others.

Technically, Memcached is a generic high-performance, distributed memory object caching system. With it, instead of simply caching local chunks of data that are in high demand, the Memcached servers and clients work together to implement a global cache. The result is a greatly reduced load on the database servers, and thus faster service for visitors.

While very handy, Memcached is not a universal answer for meeting social network's database needs. Twitter, for example, can use Memcached, but because each user has a different view of all the messages, Memchached’s usefulness is limited. That's why Twitter has built Starling, a light-weight persistent queue server to handle its users' multiple Memcached requests.

Facebook faced a different challenge with Memcached as arguably the world's leading user of Memcached, Facebook was having trouble handling the sheer network traffic at the operating system level. It reworked Linux and Memcached's networking so it could handle 200,000 UDP requests per second with an average latency of 173 microseconds. The stock Linux kernel can handle only 50,000 UDP requests per second.

Social Servers

When it comes to hardware, social networks really get close-mouthed. Still, if you look hard enough you can get an idea of just how much high-tech equipment is required to make a major social network go.

Expensive? You betcha. A typical server for Facebook these days has two or more high-speed 64-bit Intel Nehalem processors with 4 or 8 cores.

Each of these servers has as much high-speed memory as it can handle. Typically, that's from 256GBs to 512GBs of RAM. These, in turn, are linked together into clusters.

To connect all this, social networks use equipment such as Force10 Network's E-Series switches, that can handle up to 5 TBps (Terabytes per second) on the backplane with multiple gigabit and 10-gigabit Ethernet connections.

The data itself -- your notes, photos, etc.-- are kept on a variety of different vendors' storage servers. This storage is measured, not in gigabytes, but in terabytes and petabytes.

The social networks keep their server farms in data centers co-located, when possible, with major Internet's Network Access Points (NAPs). NAPs are where the major Internet service providers connect to each other and where they provide the most direct access to the Internet's fastest connections.

The next connection for the social networks will be working out exactly how to turn all this software and hardware into dollars.

Considering the problems that designers have already overcome so that we can tweet to each other about our breakfasts; to link in to a new job; or keep up with an old classmate on Facebook, it’s likely that some will find a way.

So you want to run a social network?

OK, so Facebook spends hundreds of millions on their systems, but you're looking at the software requirements -- free and open-source -- and you're thinking to yourself, "I could probably start MyVeryOwnSocialNetwork by running it on hosted virtual servers and build from there.” And, you know what? You could.

In fact, that's exactly what Guy Kawasaki, the well-known managing director of the Garage Technology Ventures venture capital firm, did. Kawasaki didn't need any venture capital though to launch Truemors, his social/news site. He managed to launch Truemors with credit-card money: $12,107.09. The single largest item on the bill? Legal fees.

While Truemors isn't a big social-networking name, it's been a viable business now for almost three years. Not bad for an investment of just over 12 grand.

Kawasaki hired the Web developers, electric pulp, for his site. These days, however, you can do it yourself thanks to open-source social networking software.

Perhaps the best of these programs is Elgg 1.6.1. This is a standalone LAMP social networking program. It comes with user profiles, blogs, file repository, forum, social bookmarking, content syndication, and OpenID and OpenSocial Web services integration. A similar well-regarded "do-it-all" social network program, which is written in Ruby on Rails, is the oddly named Lovd by Less.

If, on the other hand, you're already using say Drupal for content management services, you can also layer social networking on top of it. A recent book, Drupal 6 Social Networking gives guidance on how to build your own social network with Drupal.

Many people seem interested in recreating the look of the popular story-sharing site Digg. For them, Pligg, offers a simple 'do-it-yourself' to create your own version of Digg.

Of course, if you just want a social network that caters to your own interests, but you don't want the hassle of running your own site, there are already networks that cater to your needs. The most well-known of these is Ning.

On Ning, creating your own network is free and requires little more than having an idea and an unused name for your network. It's not as full-featured as the others, but if you just want a common, semi-private place for your friends, company, 1st grade class of 1973, etc. to hang out, you can't beat it for ease of setup.

Of course, before you do any of this, you may want to ask yourself, "Does the world need another social network?" If you still answer, "yes," than the tools are already out there and ready for you.

2 comments

    Anonymous 2 years ago
    Useful article!We use Elgg to run several verticalized communities targeting communities as diverse as Rugby Fans and Investment Professionals. We have built a number of enterprise extensions on top of Elgg that address issues such as commercial licensing of content between community members, billing and merchant services, news aggregation, automated digital marketing, automated meta-tagging etc.The underlying platform is extremely flexible, and as extensible as your budget and imagination allow.Elgg has a very established footprint in the education sector, with literally hundreds of Universities using it, as well as significant usage in corporate and government applications.Our view has always been that use of an established Open Source platform substantially mitigates the risks of using a proprietary platform from a small vendor. You end up with substantial investment tied up in the data, and the prospect of losing operational continuity because of a change in business plan, or the insolvency of the vendor is a scary thought. With Open Source, even though the software is often created by a small company, it has a life of its own and would survive the demise of the initiator because the usage community have a collective interest in its continuation and would pick things up. You are also afforded a choice of support and development resources, as the platforms develop their own vendor eco-systems.With Elgg's widespread adoption in academia, government and the private sector we think there is a high level of confidence in its continuity, and even if that wasnt the case, we have all the code and are allowed to maintain and modify it ourselves, subject to the restrictions of the license. We dont see that level of protection re continuity in any of the available proprietary alternatives, and the development momentum afforded by community input in any case means the functional aspects are both amongst the best and the most easily extensible.I believe this is why most of the successful social platforms have exhibited a preference for an open source stack, in addition to the obvious initial cost benefits.Noone should be under the illusion that Open Source means free however, unless your usage is at the "hobbyist" level.Although the software itself is free, you still need to plan for the costs of support, maintenance, versioning, operations and refinement over time, and this often means specialist commercial support and development is the best way to proceed. Elgg have recently announced a plan to offer a turnkey hosted network capability verticalized around educational needs- a "social learning environment", marketed as ElggCampus, and an enterprise-targeted version built on and fully compatible with the Open Source core, with self-managed, hosted, and cloud-based options. We ourselves retain the creators of Elgg on a contract basis for second line architectural support, and actively collaborate with them in design discussions about new features, as well as contributing back to the OS core and putting certain extensions back in the public domain as community plug-ins.The big distinction versus things like Ning is that you retain explicit ownership of your data, and the data can be readily moved into other deployment models, and even onto other platforms should Elgg no longer meet your needs.Our experience is that whilst platforms like Drupal are extremely good at what they do, the addition of social features is not the same thing as creating a genuinely social community.CMS frameworks such as Drupal are inherently hierarchical and roles based, with the emphasis on the projection of content by an arbiter, whereas a social community essentially requires a peer-to-peer model, with capabilities defined in terms of entitlements rather than roles. This has important architectural implications if you want to create a genuinely user-centric community, as opposed to a content-centric one. These are addressed within Elgg at a fundamental level.
    Anonymous 2 years ago
    I really enjoyed reading this. Well done, with lots of interesting information. Thanks!

      Add a comment

      Post a comment using one of these accounts
      Or join now
      At least 6 characters

      Note: Comment will appear soon after you have activated your account.
      Obscene/spam comments will be removed and accounts suspended.
      The information you submit is subject to our Privacy Policy and Terms of Service.

      ITworld LIVE

      SoftwareWhite Papers & Webcasts

      White Paper

      Best Practices Guide: Microsoft Exchange 2010 on VMware

      This guide provides best practice guidelines for deploying Exchange Server 2010 on vSphere.

      White Paper

      Free Trial: vRanger, the Powerful VMware Recovery Solution

      When disaster strikes, don't waste hours and dollars recovering critical data. vRanger delivers blazing-fast speed and granular recovery for your VMware applications and data. Get your free trial today.

      White Paper

      Executive Guide to Business and Software Requirements

      This paper is designed as an executive briefing on the issues surrounding business and software requirements. It features a wealth of statistics and tactics to help you get requirements right, and includes a tear-out single page summary.

      White Paper

      How to Launch a Successful IT Automation Initiative

      Corporations across all industries are under increasing pressure to cut costs and work more efficiently. In the race to meet both of these requirements, many organizations turn to technology, often purchasing and installing disparate pieces of software in hopes of achieving efficiencies not afforded by manual systems.

      White Paper

      Why Corporations Need to Automate IT Systems Management

      With corporate budgets being slashed and leaders expecting more out of their employees, companies are forced to do more with less, yet are still expected to provide the highest quality experience to customers. This is pushing them to make better use of their IT assets without breaking the budget. Companies are under more pressure than ever, thanks to data management regulations; increasingly complex security threats; and growing demand from management and end users for 24/7 uptime and high performance. These hurdles require a strategic investment in technologies that boost efficiency, save money and position IT as an integral part of the entire firm's operations. IT systems management is helping corporations fill these gaps.

      See more White Papers | Webcasts

      Answers - Powered by ITworld

      Ask a question

      Ask a Question