May 26, 2009, 10:49 AM — Financial information provider Thompson Reuters is massive on any IT scale. The company, which employees 50,000 people in 91 countries, has 20,000 servers and has been growing its 6.5 petabytes of data storage capacity 40% to 50% annually and its data center power consumption by 20% a year.
The company was looking at building a new data center every two and a half years to keep up with growth. Its data centers in Egan, Minn. are using 7.5 megawatts of power annually, so much electricity that the city of Egan was asking Thomson Reuters to pay them to upgrade the electrical supply grid.
Christopher Crowhurst, Thomson Reuters's vice president and chief architect of Architecture & Business Systems Infrastructure, said the company came up with a plan to double data storage utilization from 30% now to 60% by using virtualization and thin provisioning technology from NetApp Inc., and to use the capital expenditure savings from that project to fund a server virtualization project using VMware aimed at transitioning roughly one-third, or 7,500, of the company's servers to virtual machines.
Crowhurst spoke to Computerworld about the project. The following are excerpts from that interview.
Over time, how much growth in your data centers have you experienced? Over the past five years, we've had 780% growth in storage, 450% growth in servers and 350% growth in power consumption. We're trying to bring down the power consumption growth rate to an annual rate of 13%.
How are you going to do that? When you do all this consolidation work, in effect you're recovering megawatts through the virtualization of existing assets. Then, over the next 2 years, we'll also be avoiding a megawatt of power growth through virtualization of growth assets [future server purchases]. The net effect of this project is that within 30 to 36 months, we will have saved a year's worth of growth.
How far along are you on the virtualization project? The conversion of physical servers to virtual is due to run until the end of next year. We're currently running at 140% of plan, so we're going to complete early. We're going to keep the project going, though. Because the project is funded by the storage optimization, as long as we're recovering capital from that, we can continue to virtualize our server environment. The reality is that the project will never end because the growth side of our technology platforms will continue to drive virtualization. I think what we'll eventually start doing is extend this into a private cloud and move to do some self-provisioning for our business units as we get more confident with the management tools in these virtual environments.
How many servers have you transitioned to VMs so far? About 2,500 servers. But there's also another part of our virtualization strategy that ends up in a target goal of 7,500 servers by the end of next year.
You said you have 6.5 petabytes of storage, about one-third of that is tape. How man storage area networks [SANs] do you have? We have two logical shared SANs and we have several dedicated SANs to specific environments, such as one environment that has to be compliant with HIPPA for health care information, and another for public records databases that have to be segregated for privacy reasons. So there are a wealth of different SANs.
How do you backup your storage? For our NAS [network-attached storage] environment we use disk for backup and for our SAN environment we use a virtual tape library [near-line disk array] and then store it to tape.
What vendors are you using for disk storage? We are a combined Hitachi Data systems and EMC customer. And for NAS, we use NetApp.
As people say, one NAS box is great but 100 become unmanageable. Are you experiencing NAS sprawl problems? Well I have one and a half petabytes of primary storage on NAS. I wouldn't say NAS sprawl is my problem, just storage growth in general is the problem. Thompson Reuters Professional is an information company. We spend 41% of our technology [capital expenditure] on storage. Our annualized growth rate for storage was 49% over the past two years. So I agree with you that we end up with a lot of NAS, but our NAS is not the way traditional companies would use it. When you think of NAS you think of people putting their Windows file shares on it. We're using it for high performing databases and for virtualization environments and for replacing and consolidating direct attached storage in our virtualization world onto consolidated platforms. So in reality we are driving great efficiency through our use of NAS.
What kind of efficiency? In our secondary backup NAS environment we're running at about 104% utilization because of deduplication and thin provisioning. So to me it's not a case of filer sprawl.
We're also very excited about the opportunity that 4TB SATA drives are going to give us because that's going to allow us to do a physical consolidation over the next year or two. When we talk to the likes of Seagate they've got them in their labs and they're testing them. The opportunity with the ability to throw a very small number of very large drives in a filer [NAS array] and get multiple tiers within one storage device is tremendous. We see a huge advantage being able to go from flash drives to Fibre Channel to SATA all in one array and provide full tiering capability. It totally changes the cost model for storage in our mind.
So how are you consolidating NAS now? We're in the process of moving to that highly-tiered model. Currently, 90% of our NAS storage is what you'd consider a classic tier one environment. We believe we should get that down to 50% tier one; the remainder will be cheap and deep disk. The second thing is that currently we're running about 30% of our primary storage that is allocated is active. We're going to try to drive that up 60%. That's a fairly conservative number ? but in effect doubles your utilization of your existing footprint.
So we're expecting to effectively be able to growth within the capacity we already have for a considerable period of time through the use of thin provisioning. And we're doing this because of the advances in the storage virtualization technology coming out with NetApp's OnTap software, which allows us to move data between filers and within filers using virtual volumes.
So how are you funding the server virtualization project? Because of the storage optimization, we're leveraging the deferred [capital expenditure] spend from that and using that to fund our virtualization efforts with the servers. It will take us nearly 3 years to achieve the full benefits of this because we're doing it as we're refreshing our storage infrastructure. So we're doing it gently rather than with a big bang.
That's very important to us because as application profiles change over time, as different usage profiles from our customers change over time, we can move the data to the appropriate performance media.
Is that data migration automated or manual? Our intention is to start doing it manually, but with NetApp's OnTap [version] 8 there are intentions to put policy-based movement in place, but we're not ready yet to unleash that. But we are using policy-based management in our VMware environment where we're allowing the ESX server to move our server work loads around using DRS or Dynamic Resource Scheduling where we move our work loads around within our VMware farm.