June 19, 2013, 11:55 AM — CERN is making the infrastructure that handles the data from the Large Hadron Collider (LHC) more flexible by upgrading it with OpenStack for virtualization and Puppet for configuration management.
The research organization's objective is to change how it provides services to scientists working at the LHC, which runs in a 27-kilometer circular tunnel about 100 meters beneath the Swiss and French border at Geneva.
"One of the things we have to contend with is how to scale our infrastructure fairly significantly with a fixed staff and fixed costs. With a fixed budget you can buy more and more equipment, but you can't provide more and more services with the same number of people," said Ian Bird, LHC computing grid project leader.
But that may be possible if you change the way things are done. CERN's goal is to become more efficient by moving in the direction of infrastructure-as-a-service and platform-as-a-service with a private cloud. The goal is to be able to more dynamically change how the infrastructure is used. Right now the accelerator is shut down so the CERN data center has a different workload from last year when the LHC was running, according to Bird.
"Users also want to provision an analysis cluster with 50 machines themselves for an afternoon that then goes away again. It is about providing those kinds of services," Bird said.
CERN chose OpenStack because it seems to be the platform with the most traction behind it. OpenStack's popularity also makes it attractive from a staffing point of view, according to Bird.
"We have a transient staff, because not everybody has permanent contracts. So it's good to have people that come in with that expertise or can leave with it, and then sell it somewhere else," Bird said.
CERN is also moving away from the custom in-house software that manages the cluster itself to software like Puppet.
"When we started scaling up the cluster for LHC, the large scale Googles and Amazons didn't really exist. So we invested quite a lot of effort in configuration management and monitoring, but a couple of years ago we decided to instead go with something that had a larger support community," Bird said.
CERN looked at Chef and Puppet, and chose the latter as it worked in a way that was closer to its own management model. The rollout of Puppet and OpenStack are both underway.
Today CERN's infrastructure is distributed across about 160 data centers of different sizes located around the world.
"The reason behind that is twofold; one is given the size of the data center we have here there is no way we could have done all the computing for the LHC, and the other is political and sociological. We are given money to do computing, but it is preferred that the funding stays where it is coming from," said Bird.