Through a nebula, darkly
In 2009, SARA conducted a small-scale pilot experiment with five groups of scientific users to explore the possibilities of cloud computing in an HPC environment -- a project funded by the European Big Grid collaboration. After the initial phase, the experiment was judged a success and Big Grid moved to usher in a new phase of self-service computing for the Dutch scientific community. Today, still-experimental but much larger in scale, the SARA HPC cloud is being offered as a service to the Dutch scientific community of astrononomers, particle physicists and many others.
System expert Jeroen Nijhof told us about the project: "The new HPC cloud environment will provide researchers with a chance to operate within their very own virtual private HPC cluster that can host fully individual configurations, and will operate according to each scientific team's needs. The most attractive part of the offer is, of course, on-demand scalability."
Participants can start with their own system images or create a virtual cluster from the bottom-up. Indeed, a users is able to create a copy of their current software environment (from small or personal machines) and weave it into an HPC cluster operating within the cloud, without expensive rewriting or editing of their development and production environments.
HPC on demand -- getting a lot from a little
The HPC cloud is based in Open Nebula hosted on a cluster with 128 cores with the following characteristics:
- 16 compute nodes with dual quad-core CPUs
- 100 TB managed storage.
- Host Software:
- CFEngine 3
- KVM Multicore/multiprocess, also MPI and OpenMP.
- Virtual Private Compute Cluster: to start multiple VMs in their own private network (vlan)
Nijhof adds, "In this beta phase we strive to offer a small production grade environment and we are continuously improving and expanding the technical environment. The software we use is OpenNebula and we have developed a web-based userinterface (ONE-MC) on top of that. We have developed additional software to manage clusters of virtual machines to enable Virtual Private Compute Clusters."
Perhaps surprising to some, security is particularly important in a scientific setting so managing security is a major concern in the design. "We've put various mechanisms in place to assure that users are protected from the outside world, other users and vice-versa."
"Users are limited as little as possible. Each will have their own Virtual Private Compute Cluster, that can be configured from scratch. They can actually start with an install cd of the operating system of their choice. It is also possible to start by uploading their own pre-configured virtual machine image. As a service we provide configuration templates and provide a community repository for virtual machines. In this repository people can share their own configuration templates and virtual machine images."
Because the HPC infrastructure and computing environment are fully configurable, on demand, and to specific needs, users can save the time and effort it would take to port their applications to a specific HPC platform. In some cases such porting can be impossible because the source code is unavailable to the user. This is where virtualisation can provide a solution. Nijhof believes that the new HPC cloud infrastructure service will result in a shorter time from the posing of a scientific question to a forthcoming computational solution. "We're convinced that HPC cloud computing provides a flexible solution to scientist and provides added value to the HPC ecosystem.
Agile Lifecycle Configuration with Cfengine
So how does CFEngine fit into all this? Nijhof explains: "We invested time to search for a proper configuration management system because without it the project would fail due to the continuously improving and expanding goals. We chose CFEngine 3 based on previous experience with CFEngine 2 but we needed the flexibility and modularity that only the new CFEngine brings. We had a really tight schedule for implementing and managing the project. CFEngine 3 makes it really simple to add a new node to the cluster and manage its whole lifecycle.
"We were also able to solve another problem. Realizing we did not have a testing environment due to lack of hardware, we made one from some production nodes and with the help of Cfengine 3. We could easily convert the testing nodes back to production or convert them back again. So we basically created a testing environment on demand. We divided the cfengine 3 policy into a global configuration and custom configuration. In that way it's very easy to add other projects besides our cloud into CFEngine 3 as well. Using methods was really a good solution to keep everything clean, simple and modular."
Returning to the issue of security, Nijhof recognizes that scientific users need to be protected against threats, as scientific sites often attract hackers, and the integrity of scientific data are at stake. "The other big advantage of using CFEngine 3 not only for configuration management but for security checks. By monitoring the promises not kept we can easily see i.e. which files are modified and avoid accidents by proactively repairing possible holes."