New chips don't deliver, Facebook says

By Stephen Lawson, IDG News Service |  Data Center/Servers, AMD, Facebook 18 comments

The latest generations of server processors from Intel and Advanced Micro Devices don't deliver the promised gains in performance, according to the head of technical operations at Facebook, a massive consumer of servers.

The social networking company is constantly trying to upgrade its infrastructure to keep up with growth in users and data, while trying to minimize power consumption to save money, said Jonathan Heiliger, vice president of technical operations. He was interviewed on stage by GigaOm Network founder Om Malik at GigaOm's Structure conference in San Francisco on Thursday. Malik asked him about unexpected problems in keeping up.

"The biggest thing (that) surprised us is ... less-than-anticipated performance gains from new microarchitectures -- so, new CPUs from guys like Intel and AMD. The performance gains they're touting in the press, we're not seeing in our applications," Heiliger said. "And we're, literally in real time right now, trying to figure out why that is."

The hardware industry has also fallen short when it comes to delivering very power-efficient servers to carry out a limited set of functions for companies such as Facebook and Amazon, Heiliger said. He had some words for server OEMs (original equipment manufacturers).

"You guys don't get it," Heiliger said. "To build servers for companies like Facebook, and Amazon, and other people who are operating fairly homogeneous applications, the servers have to be cheap, and they have to be super power-efficient." That means more than just an efficient power supply, but a whole system down to the processor, he said. Google has done a great job designing and building its own servers for this kind of use, Heiliger added.

Facebook is still working with server makers on this and doesn't know why they continue to fail, Heiliger said. He hopes to see cooperation among organizations deploying large computing clusters to develop a set of common standards that vendors can design for.

Heiliger had one piece of advice for anyone building an infrastructure to handle large-scale Internet-based services.

"There's a pretty simple answer for scaling infrastructure. It's, 'Don't be cheap,'" Heiliger said. He added that Facebook does drive hard bargains with its hardware and software infrastructure suppliers, and is careful not to overbuy.

The best way to scale up a system is to look at application, software and hardware infrastructure, pick one to focus on, and add to that first. Facebook focuses on application infrastructure and upgrades the other two to keep up with that, he said.

Testing is another key to the operational success of Facebook, which has more than 200 million users and frequently introduces new features. Heiliger said the launch of Facebook's personalized usernames earlier this month went smoothly, despite an explosive response when it first went live, because of extensive testing of the new feature.

It took about two months to roll out the new feature, from concept to availability, he said. When the personalized usernames became available on a first-come, first-served basis, at 9 p.m. on a Friday night at Facebook's Silicon Valley headquarters, users claimed 1 million names in the first hour without slowing down the service as a whole, he said.

18 comments

    Anonymous 2 years ago
    The article doesn't specify which operating system they are working with.The chickens are starting to come home to roost for Windows and Linux. Now that chips from Intel and AMD are capable of supporting dozens of threads in a single system, it follows that the operating system must scale to high thread count.Linux and Windows are inadequate in this area. Solaris, on the other hand, scales to hundreds of threads due to Sun's experience in scalability on very large SMP SPARC platforms, and Sun is now reaping the benefit of that investment on 4-socket Intel/AMD systems with 6 cores per socket.The "SPARC fanboy" in an earlier post may have missed the mark on hardware, but he's right on target with the OS choice.
    Anonymous 2 years ago
    So what he's saying is that Google builds themselves some great data centers using existing Intel and AMD processors, but Facebook cannot do the same? Seems like his criticism would be better directed at the designers of the server platforms they buy and / or the designers of their data centers.They're trying to find "in real time" why they're not getting the perf increases they expected. I'm guessing it's unreasonable expectations, combined with sub-optimal application architecture. The apps are probably not as able to efficiently take advantage of all the cores in a system as they thought they were, and that has more to do with software architecture than processor microarchitecture. The processor microarchitectures have been well known and documented for years before they were released, and my bet is they haven't properly designed apps to take advantage of their power.
    Anonymous 2 years ago
    He says they need lots of servers so "the servers have to be cheap," But really they have to buy servers regardless of the price if they want to keep up with the growth so no need for server makes to make them cheap. Notice later he also says "There's a pretty simple answer for scaling infrastructure. It's, 'Don't be cheap,'" quality doesn't come cheap. by the way, it sucks that after I preview if I make another change I have to re-enter the captcha. And why can't I get blank lines between paragraphs?
    ITworld staff
    ITworld staff 2 years ago in reply to Anonymous
    Edward, thank you for the feedback on the capture re-enter and line spacing. Changes are coming soon!
    Anonymous 2 years ago
    Could It BeeYour CrummyPee Haitch Pee?
    Anonymous 2 years ago
    "Testing is another key to the operational success of Facebook, which has more than 200 million users and frequently introduces new features. Heiliger said the launch of Facebook's personalized usernames earlier this month went smoothly, despite an explosive response when it first went live, because of extensive testing of the new feature."Yes, congratulations! You gave people a username. That's a fantastic achievement - and a very useful one as well. Great to have such forward thinking in the world.
    Anonymous 2 years ago
    Bad web application code and design == Bad performance on any web server.
    Anonymous 2 years ago
    1) Facebook & Amazon need cheap, power efficient systems2) Intel and AMD aren't measuring up with processors to power these systems3) However, Google has systems appropriate for this use (presumably using Intel or AMD processors)If that's his argument, then it would seem that the real conclusion is that Facebook can't build systems as good as Google's, even though they are using the same processor technology.
    Anonymous 2 years ago in reply to Anonymous
    Google's custom motherboards take a single 12v input voltage then use high efficiency onboard transformers to provide the various voltages required by different subsystems. Having the power supply provide all those voltages (like it's done in servers from HP, Dell, etc.) is much less efficient. So it's not a flawed argument based on the commonality of Intel and AMD processors. A lot of power is lost in your 80% efficient AC Power supply and making the power supply more efficient isn't the best solution.But, yes, it's true, most companies don't have and can't justify the engineering resources to custom design system boards like Google has.
    Anonymous 2 years ago in reply to Anonymous
    If I had the cash, I'd roll out dual core 1.6Ghz Atom boards. They have a small form factor, and I bet you could cram loads of them into a custom rack ala Google-style. Some might argue that the Atom's don't have enough juice, but if you consider the power consumption, the form factor I think it's a viable solution.
    Anonymous 2 years ago in reply to Anonymous
    However, Google has systems appropriate for this use (presumably using Intel or AMD processors)The argument was directed at server vendors.
    Anonymous 2 years ago in reply to Anonymous
    It's not just about the processor, it's about the entire integrated system. From the power supply, to the motherboard layout, to what voltages are actually necessary to send to the motherboard, ...Google designs and builds their own power supplies, motherboards, etc., to be power efficient (http://news.cnet.com/8301-1001_3-10209580-92.html). No OEM comes close to doing anything like it.
    Anonymous 2 years ago in reply to Anonymous
    That seems to be the case and I agree completely. So basically, facebook is going crazy try to find a solution to the $300 million dollars of servers that they are using. So they try buying new Intel and AMD processors, only to find out they arn't as good as advertised...duh
    Anonymous 2 years ago in reply to Anonymous
    Or perhaps Google a seemingly larger organization can better exploit economies of scale so they they can afford to do their own R&D and even some fab while facebook has to rely upon the offerings of HP & Sun etc.
    Anonymous 2 years ago in reply to Anonymous
    I understand that argument, but the truth is that this is a guy from Facebook. First, he's not speaking for Amazon so they shouldn't be in the argument. They very well may have more optimized server hardware. Second, Google is a company making many BILLIONS of dollars worth of profit. Facebook has yet to make one. Google can afford to design (or pay for someone to design) their own hardware platforms. Facebook doesn't have this option. What he was basically saying is, if Google can do it, why can't an OEM offer this for the many other companies who don't have that money? Google uses commodity hardware to reach this goal so it should be relatively easy and profitable for OEM vendors to do this also.
    Anonymous 2 years ago
    Look at something like a Sun T5540 if you want power/performance ratio's as well as something that is a monster of a database server and webserver. I mean, its hard to beat 256 simultaneous process threads when talking about web hosting or database hosting.
    Anonymous 2 years ago in reply to Anonymous
    Sun is assuredly not inexpensive, and the only thing fast about sparc systems is how quickly they drain your wallet.Power efficient servers are easy to build. You simply give up lots of things you would otherwise get in a 1U or blade. I am guessing that the facebook person must not know how to build servers, or what goes into servers. Or he would understand that power efficient *MEANS* slow.If this guy really wants to design large fast clusters of power efficient machines at huge scale, he should be talking to the custom HPC solution shops, that have been doing this for their customers for years. Point this guy to here and here. Ignoring some of the large companies, and focus upon those that will design the systems for you correctly from the outset.
    Anonymous 2 years ago in reply to Anonymous
    another diehard sparc fanboy yearning for the 90s good old days. the only non-x86 processor that anybody cares about is ARM, and not for much longer.

      Add a comment

      Post a comment using one of these accounts
      Or join now
      At least 6 characters

      Note: Comment will appear soon after you have activated your account.
      Obscene/spam comments will be removed and accounts suspended.
      The information you submit is subject to our Privacy Policy and Terms of Service.

      ITworld LIVE

      Data Center/ServersWhite Papers & Webcasts

      White Paper

      Business Value of Blade

      The nature of the blade platform makes system management, monitoring and provisioning easy and efficient. Access this resource to learn how blade migration will save your data center time and money while increasing performance.

      White Paper

      Measuring the Business Value of CI in the Data Center - IDC-HP White Paper

      One of the key strategies that IT teams are pursuing to reduce capital costs while boosting asset utilization and employee productivity is the transition to highly virtualized data centers. However, IDC finds that expectations for further boosts in IT asset use and operational efficiency often surpass the actual results for a variety of reasons. These problems can quickly overwhelm any hoped-for benefits as the scope of virtual server deployment expands.

      White Paper

      HP CloudSystem Matrix: Managing at a Higher Level

      This white paper examines IT management challenges from a fundamental and system standpoint. In addition, it introduces the concept of a service-oriented and automated approach to IT management.

      White Paper

      Five Myths of Cloud Computing

      Cloud computing continues to grow in popularity among the IT industry. And more businesses are advertising that they are the front runner for providing the best cloud services. However, in this race to remain top dog, IT pros remain unsure of what cloud computing is and the benefits it can bring to their organization.

      White Paper

      HP CloudSystem Matrix: Building a Private Cloud

      Cloud computing continues to grow in popularity among the IT industry. And more businesses are advertising that they are the front runner for providing the best cloud services. However, in this race to remain top dog, IT pros remain unsure of what cloud computing is and the benefits it can bring to their organization.

      See more White Papers | Webcasts

      Ask a question

      Ask a Question