Review: Intel's Westmere struts its stuff

Fast AES encryption, better scalability, and consistent per-core performance make the new six-core Xeon a worthy successor to Nehalem.

By Paul Venezia, InfoWorld |  Hardware, Intel, processors Add a new comment

In the year since Intel released the Nehalem-EP quad-core Xeon CPU, all the hubbub surrounding that chip and its new design has proven accurate. Bigger, better, faster, more -- a whole lot more than anything that Intel had ever released before. But that was then, this is now, and as they say, what have you done for me lately?

Today, the Nehalem-EP gives way to the Westmere-EP, or X5600-series Xeon CPU. Whereas the Nehalem was a big-time performance bump over the previous generation, the Westmere is a more incremental and predictable improvement, but it's definitely a better chip. Westmere picks up where Nehalem left off.

[ Also on InfoWorld: Intel's Westmere and AMD's Magny-Cours will change the face of IT forever. See "Modern multi-core and the next generation of IT." ]

The key developments in Westmere-EP are two more cores (six total), the ability to address two DIMMs per channel at 1,333MHz, a 50 percent larger L3 cache, a set of instructions (AES-NI) for accelerating AES encryption, and better CPU power management. Westmere is the equal of Nehalem in single-threaded workloads, but far more scalable thanks to the additional two cores per die. The speed of Westmere's encryption operations will also turn heads.

In fact, the 400 percent performance increase shown with the AES-NI instructions makes whole-disk encryption almost unnoticeable. Previously encryption required a fairly sizable performance trade-off, but with the Westmere's AES performance jump, it becomes a no-brainer. And that's just one of many potential use cases of the AES-NI features.

Generation gap The Westmere is built on the same basic guidelines as the Nehalem -- integrated memory controller, shared L3 cache per socket, and QPI (QuickPath Interconnect) -- but it's based on a 32nm process rather than Nehalem's 45nm. It runs up to 3.33GHz per core, and two threads per core with Hyper-Threading. That's 24 logical CPUs in a two-socket system, all balanced against 6.4GT/s QPI. It's definitely fast, but not terribly so when compared to Nehalem CPUs running at the same clock speed.

Like Nehalem, Westmere implements Turbo Mode to ramp up the clock speed on certain cores depending on load. Turbo Mode benefits single-threaded and lightly threaded applications by increasing the performance of a few cores when needed.

Also, Westmere CPUs sit in the same sockets as Nehalem CPUs. In fact, some Nehalem-based mainboards can support Westmere already, possibly requiring a BIOS update. This isn't true of all Nehalem systems, however, so do some research first.

In a bid to reduce power consumption, Westmere CPUs can essentially gate off unused cores and shut them down to reduce power, saving their state in cache. Yes, Nehalems can do this too, but Westmere chips can also gate off the uncore, or the region of the CPU that is tasked not with central processing but with memory control and L3 cache, bus controllers, and so on. Whereas a Nehalem could power gate each core, the Westmere can power gate everything, which has the benefit of reducing power draw at idle.

Also in the realm of reducing power consumption, the Westmere CPUs can use low-voltage DDR3 RAM running at 1.35 volts as well as standard DDR3 1.5-volt DIMMs. In addition to the relatively small reduction in power draw, low-voltage DIMMs generate less heat, thereby reducing overall cooling requirements, which is especially significant in servers and blades with high RAM counts.

Bench time I had the opportunity to run a series of benchmarks on two sets of Westmere chips, the X5670s and X5680s. Both six-cores, the X5670s run 2.93GHz per core, while the X5680s run 3.33GHz per core. The tests were my standard array of real-world workloads rather than mainline benchmarking tools. They are composed of LAME MP3 audio conversion tests, gzip and bzip2 compression tests, MD5 calculation tests, and MP4-to-FLV video conversion tests. Each of these tests is a single-threaded process, but they are run concurrently at increasing levels to measure performance of the processors under various loads. I start at a 1:1 physical-core-to-process level, then ramp up the ratio significantly.

For these tests, I compared a two-CPU, 8-core 3.20GHz Nehalem W5580 system with 24GB of DDR3 RAM running at 1,333MHz to a two-CPU, 12-core 3.33GHz Westmere X5680 system with 24GB of DDR3 RAM running at 1,333MHz. Aside from the slight difference in clock speed, these are essentially the same chip, but one generation apart. All tests were run from RAM disks to eliminate disk I/O from interfering with the raw CPU tests, and Hyper-Threading was enabled.


Originally published on InfoWorld |  Click here to read the original story.

ITworld LIVE

HardwareWhite Papers & Webcasts

White Paper

Deliver Cost-Effective Business Continuity with Extreme Capacity

IBM DB2 provides application cluster transparency technology that equips organizations running OLTP applications with the ability to deliver high availability and continuous uptime for transactional data, plus the flexibility and capacity they need to remain competitive.

White Paper

Expert Tips for Consolidating Servers & Avoiding Sprawl

The combined computing demands of VMs can tax even the most powerful server. Cost-effectiveness doesn't mean excessive consolidation; rather, it means balancing workloads between multiple servers. This expert FAQ guide will help you to decide which servers and applications are candidates for virtualization.

White Paper

Expert Guide to Secure Your Active Directory

Layered security is the way to go when it comes to protecting Active Directory. This expert e-guide explains the best method to use when planning and designing a security solution. Find out why it is important to secure Group Policy settings and discover how managed service accounts boost server security in R2.

White Paper

Windows Server 2008 R2 Learning Guide

This expert e-guide uncovers the most common questions that have surfaced with Windows Server 2008 R2. Learn details about this Microsoft operating system and discover the direct cost saving benefits IT departments can experience when making the switch.

White Paper

Best Practices to Achieve Optimal Memory Allocation and Remote Desktop User Experience

Many virtual machines don't fully utilize their available RAM, just like they don't fully utilize their available processors. But Dynamic Memory enables you to shuffle the deck and move some of that RAM around to go where it's needed for better consolidation and efficiency.

See more White Papers | Webcasts

Ask a question

Ask a Question