Industrial Light & Magic has been replacing its servers with the hottest new IBM BladeCenters -- literally, the hottest.
For every new rack ILM brings in, it cuts overall power use in the data center by a whopping 140 kW -- a staggering 84% drop in overall energy use.
But power density in the new racks is much higher: Each consumes 28 kW of electricity, versus 24 kW for the previous generation. Every watt of power consumed is transformed into heat that must be removed from each rack -- and from the data center.
The new racks are equipped with 84 server blades, each with two quad-core processors and 32GB of RAM. They are powerful enough to displace seven racks of older BladeCenter servers that the special effects company purchased about three years ago for its image-processing farm.
To cool each 42U rack, ILM's air conditioning system must remove more heat than would be produced by nine household ovens running at the highest temperature setting. This is the power density of the new infrastructure that ILM is slowly building out across its raised floor.
These days, most new data centers have been designed to support an average density of 100 to 200 watts per square foot, and the typical cabinet is about 4 kW, says Peter Gross, vice president and general manager of HP Critical Facilities Services. A data center designed for 200 W per square foot can support an average rack density of about 5 kW. With carefully engineered airflow optimizations, a room air conditioning system can support some racks at up to 25 kW, he says.
Maximum operating temperatures for data center gear
Before 2004: 72 degrees F
Source: ASHRAE Technical Committee 9.9
At 28 kW per rack, ILM is at the upper limit of what can be cooled with today's computer room air conditioning systems, says Roger Schmidt, IBM fellow and chief engineer for data center efficiency. "You're hitting the extreme at 30 kW. It would be a struggle to go a whole lot further," he says.
[Read our related story, "Why data center temperatures have moderated."]
The sustainability question
The question is, what happens next? "In the future are watts going up so high that clients can't put that box anywhere in their data centers and cope with the power and cooling? We're wrestling with that now," Schmidt says. The future of high-density computing beyond 30 kW will have to rely on water-based cooling, he says. But data center economics may make it cheaper for many organizations to spread out servers rather than concentrate them in racks with ever-higher energy densities, other experts say.
Refresh your servers. Each new generation of servers delivers more processing power per square foot -- and per unit of power consumed. For every new BladeCenter rack Industrial Light & Magic is installing, it has been able to retire seven racks of older blade technology. Total power savings: 140 kW.
Charge users for power, not just space. "You can be more efficient if you're getting a power consumption model along with square-footage cost," says Ian Patterson, CIO at Scottrade.
Use hot aisle/cold aisle designs. Good designs, including careful placement of perforated tiles to focus airflows, can help data centers keep cabinets cooler and turn the thermostat up.
Kevin Clark, director of information technologies at ILM, likes the gains in processing power and energy efficiency he has achieved with the new BladeCenters, which have followed industry trends to deliver more bang for the buck. According to IDC, the average server price since 2004 has dropped 18%, while the cost per core has dropped by 70%, to $715. But Clark wonders whether doubling compute density again, as he has in the past, is sustainable. "If you double the density on our current infrastructure, from a cooling perspective, it's going to be difficult to manage," he says.
He's not the only one expressing concerns. For more than 40 years, the computer industry's business model has been built on the rock-solid assumption that Moore's Law would continue to double compute density every two years into perpetuity. Now some engineers and data center designers have begun to question whether that's feasible -- and whether a threshold has been reached.
The threshold isn't just about whether chip makers can overcome the technical challenges of packing transistors even more densely than today's 45nm technology allows, but whether it will be economical to run large numbers of extremely high-density server racks in modern data centers. The newest equipment concentrates more power into a smaller footprint on the raised floor, but the electromechanical infrastructure needed to support every square foot of high-density compute space -- from cooling systems to power distribution equipment, UPSs and generators -- is getting proportionally larger.
Data center managers are taking notice. According to a 2009 IDC survey of 1,000 IT sites, 21% ranked power and cooling as the No. 1 data center challenge. Nearly half (43%) reported increased operational costs, and one-third had experienced server downtime as a direct result of power and cooling issues.
Christian Belady is the lead infrastructure architect for Microsoft's Global Foundation Services group, which designed and operates the company's newest data center in Quincy, Wash. He says the cost per square foot of a raised floor is too high. In the Quincy data center, he says, those costs accounted for 82% of the total project.
The case for, and against, running data centers hotter
Raising the operating temperature of servers and other data center gear doesn't always save on cooling costs. Most IT manufacturers increase fan speeds for servers and other equipment as temperatures exceed about 77 degrees F to keep the processor and other component temperatures constant, says IBM fellow Roger Schmidt. At temperatures above 77 degrees, the speed of fans in most servers sold today increases significantly and processors suffer higher leakage currents.
Power consumption increases as the cube of the fan speed -- so if speed increases by 10%, that means a 33% increase in power. At temperatures above 81 F, data center managers may think they're saving energy when in fact servers are increasing power usage at a faster rate than what is saved in the rest of the data center infrastructure.
Bottom line: You would still save energy overall if you raised the temperature to 81, but going higher presents challenges to systems and component designers. Could equipment be designed to operate at higher temperatures? Possibly, Schmidt says. "Manufacturers will have to come together as a group to determine whether we should recommend a higher limit that will, in fact, save energy at the data center level."
Tom Bradicich, an IBM vice president for architecture and technology for the company's x86 servers, says that with all of the different equipment in a data center, getting the facility optimized for 81 degrees is difficult. Even getting the components in the boxes IBM builds to meet the current spec can be a challenge. "We're working in a world where we integrate a lot of third-party components. At the end of the day, IBM doesn't make the microprocessor and other components."
Dyan Larson, director of data center technology initiatives at Intel, thinks the day when everything in a data center can run safely at 81 degrees is still a long ways off. "There's a reliability concern people have when it comes to running data centers at higher temperatures. Until the industry says, 'We're going to warranty these things for higher temperatures,' we're not going to get there."
"We're beyond the point where more density is better," Belady says. "The minute you double compute density, you double the footprint in the back room."
HP's Gross has designed large data centers for both enterprises and Internet-based businesses like Google's or Yahoo's. Internet-based data centers consist of large farms of Web servers and associated equipment. Gross thinks Belady's costs are about average. Electromechanical infrastructure typically makes up about 80% of the cost of a new Tier 4 enterprisedata center's cost, regardless of the size of the facility. That number is generally 65% to 70% for Internet-based data centers, he says. Those numbers haven't increased much as power densities have increased in recent years, he adds.
As compute density per square foot increases, overall electromechanical costs tend to stay about the same, Gross says. But because power density also increases, the ratio of electromechanical floor space needed to support a square foot of high-density compute floor space also goes up.
IBM's Schmidt says the cost per watt, not the cost per square foot, remains the biggest construction cost for new data centers. "Do you hit a power wall down the road where you can't keep going up this steep slope? The total cost of ownership is the bottom line here," he says. Those costs have for the first time pushed some large data center construction projects past the $1 billion mark. "The C suites that hear these numbers get scared to death because the cost is exorbitant," he says.
Ever-higher energy densities are "not sustainable from an energy use or cost perspective, says Rakesh Kumar, analyst at Gartner Inc. Fortunately, most enterprises still have a ways to go before they see average per-rack loads in the same range as ILM's. Some 40% of Gartner's enterprise customers are pushing beyond the 8 to 10 kW per rack range, and some are as high as 12 to 15 kW per rack. However, those numbers continue to creep up.
In response, some enterprise data centers, and managed services providers like Terremark Inc., are starting to monitor power use and factor it into what they charge for data center space. "We're moving toward a power model for larger customers," says Ben Stewart, senior vice president of engineering at Terremark. "You tell us how much power, and we'll tell you how much space we'll give you."
But is it realistic to expect customers to know not just how much equipment they need hosted but how much power will be needed for each rack of equipment?
"For some customers, it is very realistic," Stewart says, In fact, Terremark is moving in this direction in response to customer demand. "Many of them are coming to us with a maximum-kilowatt order and let us lay the space out for them," he says. If a customer doesn't know what its energy needs per cabinet will be, Terremark sells power per "whip," or power cable feed to each cabinet.
Containment: The last frontier
IBM's Schmidt thinks further power-density increases are possible, but the methods by which data centers cool those racks will need to change.
More energy-efficiency tips
Look for the most efficiently designed servers. Hardware that meets the EPA's Energy Star specification offers features such as power management, energy-saving power supplies and variable-speed cooling fans. The upfront price may be slightly higher but is typically offset by lower operating costs over the product's life cycle.
Consider cold-aisle containment. Once you have a hot aisle/cold aisle design, the next step for cabinets exceeding about 4 kW is to use cold-aisle containment techniques to keep high-density server cabinets cool. This may involve closing off the ends of aisles with doors, using ducting to target cold air and installing barriers atop rows to prevent hot air from circulating over the tops of racks.
Use variable-speed fans. Computer room air conditioning systems rely on fans, or air handlers, to push cold air in and remove hot air from the space. A reduction in fan speed of 12.5% cuts power use in half.
ILM's data center, completed in 2005, was designed to support an average load of 200 W per square foot. The design has plenty of power and cooling capacity overall. It just doesn't have a method for efficiently cooling high-density racks.
ILM uses a hot aisle/cold aisle design, and the staff has successfully adjusted the number and position of perforated tiles in the cold aisles to optimize airflow around the carefully sealed BladeCenter racks. But to avoid hot spots, the room air conditioning system is cooling the entire 13,500-square-foot raised floor space to a chilly 65 degrees.
Clark knows it's inefficient; today's IT equipment is designed to run at temperatures as high as 81, so he's looking at a technique called cold-aisle containment.
IBM's Roger Schmidt says the cost per watt, not the cost per square foot, remains the biggest construction cost for new data centers.
Other data centers are already experimenting with containment -- high-density zones on the floor where doors seal off the ends of either the hot or cold aisles. Barriers may also be placed along the top of each row of cabinets to prevent hot and cold air from mixing near the ceiling. In other cases, cold air may be routed directly into the bottom of each cabinet, pushed up to the top and funneled into the return-air space in the ceiling plenum, creating a closed-loop system that doesn't mix with room air at all. "The hot/cold aisle approach is traditional but not optimal," says Rocky Bonecutter, data center technology and operations manager at Accenture. "The move now is to go to containment."
Using such techniques, HP's Gross estimates that data centers can support up to about 25 kW per rack using a computer room air conditioning system. "It requires careful segregation of cold and hot, eliminating mixing, optimizing the airflow. These are becoming routine engineering exercises," he says.
Liquid makes its entrance
While redesigning data centers to modern standards has helped reduce power and cooling problems, the newest blade servers are already exceeding 25 kW per rack. IT has spent the past five years tightening up racks, cleaning out raised floor spaces and optimizing air flows. The low-hanging fruit is gone in terms of energy efficiency gains. If densities continue to rise, containment will be the last gasp for computer-room air cooling.
Some data centers have already begun to move to liquid cooling to address high-density "hot spots" in data centers. The most common technique, called closely coupled cooling, involves piping chilled liquid, usually water or glycol, into the middle of the raised floor space to supply air-to-water heat exchangers within a row or rack. Kumar estimates that 20% of Gartner's corporate clients use this type of liquid cooling for at least some high-density racks.
These closely coupled cooling devices may be installed in a cabinet in the middle of a row of server racks, as data center vendor APC does with its InRow Chilled Water units, or they can attach directly onto each cabinet, as IBM does with its Rear Door Heat eXchanger.
Closely coupled cooling may work well for addressing a few hot spots, but it is a supplemental solution and doesn't scale well in a distributed computing environment, says Gross. IBM's Rear Door Heat eXchanger, which can cool up to 50,000 BTUs -- or 15 kW -- can remove about half of the waste heat from ILM's 28-kW racks. But Clark would still need to rely on room air conditioners to remove the remaining BTUs.
HP's Peter Gross says most IT managers won't want to pay the extra money needed for super-high densities and will look to distribute servers instead of crunching more into the same amount of space.
Closely coupled cooling also requires building out a new infrastructure. "Water is expensive and adds weight and complexity," Gross says. It's one thing to run water to a few mainframes. But the network of plumbing required to supply chilled water to hundreds of cabinets across a raised floor is something most data center managers would rather avoid. "The general mood out there is, as long as I can stay with conventional cooling using air, I'd rather do that," he says.
"In the distributed model, where they use 1U or 2U servers, the power needed to support thousands of these nodes may not be sustainable," Schmidt says. He thinks data centers will have to scale up the hardware beyond 1U or 2U distributed x86-class servers to a centralized model using virtual servers running on a mainframe or high-performance computing infrastructure.
One way to greatly improve heat-transfer efficiency is through direct-liquid cooling. This involves piping chilled water through specialized cold plates that make direct contact with the processor. This is important because as processor temperatures rise, transistors suffer from an increase in leakage current. Leakage is a phenomenon in which a small amount of current continues to flow through each transistor, even when the transistor is off.
Using cold plates reduces processor leakage problems by keeping the silicon cooler, allowing servers to run faster -- and hotter. In a test of a System p 575 supercomputer, Schmidt says IBM used direct-liquid cooling to improve performance by one-third while keeping an 85 kW cabinet cool. Approximately 70% of the system was water-cooled.
Few data center managers can envision moving most of their server workloads onto expensive, specialized supercomputers or mainframes.
But IBM's Bradicich says incremental improvements such as low-power chips or variable-speed fans aren't going to solve the problem alone. Architectural improvements to the fundamental x86 server platform will be needed.
Cost, convergence and economies of scale
Like HP and other IT vendors, IBM is working on what Bradicich calls "operational integration" -- a converged infrastructure that combines compute, storage and networking in a single package. While the primary goal of converged infrastructure is to make systems management easier, Bradicich sees power and cooling as part of that package. In IBM's view, the x86 platform will evolve into highly scalable, and perhaps somewhat more proprietary, symmetric multiprocessing systems designed to dramatically increase the workloads supported per server -- and per rack. Such systems would require bringing chilled water to the rack to meet cooling needs.
But HP's Gross says things may be going the other direction. "Data centers are going bigger in footprint, and people are attempting to distribute them," he says. "Why would anyone spend the kind of money needed to achieve these super-high densities?" he asks -- particularly when they may require special cooling.
IBM's Schmidt says data centers with room-based cooling -- especially those that have moved to larger air handlers to cope with higher heat densities -- could save considerable energy by moving to water.
But Microsoft's Belady thinks liquid cooling's appeal will be limited to a single niche: high-performance computing. "Once you bring liquid cooling to the chip, costs start going up," he contends. "Sooner or later, someone is going to ask the question: Why am I paying so much more for this approach?"
More energy-efficiency tips
Turn on power management. Most servers ship with energy-saving technologies that do things like control cooling-fan speeds and step down CPU power during idle times, but it's not turned on by default -- and many data centers still don't enable it. Consider enabling it by default, except in environments where high availability and fast response times are mission-critical.
Create zones. Break the data center floor into autonomous zones, where each block of racks has its own dedicated power and cooling resources. Zoning involves careful separation of hot and cold air but usually doesn't require that an area be physically partitioned off.
Douse hot spots with closely coupled cooling. A series of high power-density racks can create a hot spot that the room air conditioning system can't handle, or that forces IT to overcool the entire room to address a few cabinets. In those cases, consider supplemental spot-cooling systems. These require piping chilled liquid -- either cold water or glycol -- to a heat exchanger that's either attached or adjacent to a high-density cabinet.
He doesn't see liquid cooling as a viable alternative in distributed data centers such as Microsoft's.
The best way to take the momentum away from ever-increasing power density is to change the chargeback method for data center use, says Belady. Microsoft changed its cost allocation strategy and started billing users based on power consumption as a portion of the total power footprint of the data center, rather than basing it on floor space and rack utilization. After that, he says, "the whole discussion changed overnight." Power consumption per rack started to dip. "The whole density thing gets less interesting when your costs are allocated based on power consumed," he says.
Once Microsoft began charging for power, its users' focus changed from getting the most processing power in the smallest possible space to getting the most performance per watt. That may or may not lead to higher-density choices -- it depends on the overall energy efficiency of the proposed solutions. On the other hand, Belady says, "if you're charging for space, the motivation is 100% about density."
Microsoft's Christian Belady says that once Microsoft began charging for power, users' focus changed from getting the most processing power into the smallest possible space to getting the most performance per watt.
Today, vendors design for the highest density, and most users select high-density servers to save on space charges. Users may pay more for a higher-density server infrastructure to save on floor space charges, even when performance per watt is higher due to extra power distribution and cooling needs. But on the back end, 80% of operating costs scale with electricity use -- and the electromechanical infrastructure needed to deliver power and cool the equipment.
Run 'em hard, run 'em hot
Belady, who previously worked on server designs as a distinguished engineer at HP, argues that IT equipment should be designed to work reliably at higher operating temperatures. Current equipment is designed to operate at a maximum temperature of 81 degrees. That's up from 2004, when the official specification, set by the ASHRAE Technical Committee 9.9, was 72 degrees.
But Belady says running data center gear even hotter than 81 degrees could result in enormous efficiency gains.
"Once you start going to higher temperatures, you open up new opportunities to use outside air and you can eliminate a lot of the chillers ... but you can't go as dense," he says. Some parts of the country already turn off chillers in the winter and use economizers, which use outside air and air-to-air or air-to-water heat exchangers, to provide "free cooling" to the data center.
If IT equipment could operate at 95 degrees, most data centers in the U.S. could be cooled with air-side economizers almost year-round, he argues. And, he adds, "if I could operate at 120 degrees ... I could run anywhere in the world with no air conditioning requirements. That would completely change the game if we thought of it this way." Unfortunately, there are a few roadblocks to getting there. (See "The case for, and against, running servers hotter.")
Belady wants equipment to be tougher, but he also thinks servers are more resilient than most administrators realize. He believes that the industry needs to rethink the kinds of highly controlled environments in which distributed computing systems are hosted today.
The ideal strategy, he says, is to develop systems that optimize each rack for a specific power density and manage workloads to ensure that each cabinet hits that number all the time. In this way, both power and cooling resources would be used efficiently, with no waste from under- or overutilization. "If you don't utilize your infrastructure, that's actually a bigger problem from a sustainability standpoint than overutilization," he says.
Belady sees a bifurcation coming in the market. High-performance computing will go to water-based cooling while the rest of the enterprise data center -- and Internet-based data centers like Microsoft's -- will stay with air but move into locations where space and power costs are cheaper so they can scale out.
More energy-efficiency tips
Retrofit for efficiency. While new data center designs are optimized for cooling efficiency, many older ones still have issues. If you haven't done the basics, optimizing perforated-tile placements in the cold aisle or putting blankets over cabling in the floor space are good places to start.
Install temperature monitors. It's not enough to monitor the room temperature. Adding more sensors allows better control in the row or rack.
Turn up the heat. The key to raising efficiency is your intake temperatures on the cabinets. The higher the intake temperature, the more energy-efficient the data center. While you probably can't cool an entire cabinet with the room set at 81 degrees at the intake, you probably don't need to be setting the temperature as low as 65, either.
Paul Prince, CTO of the enterprise product group at Dell, doesn't think most data centers will hit the power-density wall anytime soon. The average power density per rack is still manageable with room air, and he says hot aisle/cold aisle designs and containment systems that create "super-aggressive cooling zones" will help data centers keep up. Yes, densities will continue their gradual upward arc. But, he says, it will be incremental. "I don't see it falling off a cliff."
At ILM, Clark sees the move to water, in the form of closely coupled cooling, as inevitable. Clark admits that he, and most of his peers, are uncomfortable with the idea of bringing water into the data center. But he thinks that high-performance data centers like his will have to adapt. "We're going to get pushed out of our comfort zone," he says. "But we're going to get over that pretty quickly."
Robert L. Mitchell writes technology-focused features for Computerworld. Follow Rob on Twitter at http://twitter.com/rmitch, send e-mail to firstname.lastname@example.org or subscribe to his RSS feed.
This story, "Data center density hits the wall" was originally published by Computerworld.