Abstract

Experiments in high-energy physics (HEP) rely on elaborate hardware, software and computing systems to sustain the high data rates necessary to study rare physics processes. The Institut fr Experimentelle Kernphysik (EKP) at KIT is a member of the CMS and Belle II experiments, located at the LHC and the Super-KEKB accelerators, respectively. These detectors share the requirement, that enormous amounts of measurement data must be processed and analyzed and a comparable amount of simulated events is required to compare experimental results with theoretical predictions.Classical HEP computing centers are dedicated sites which support multiple experiments and have the required software pre-installed. Nowadays, funding agencies encourage research groups to participate in shared HPC cluster models, where scientist from different domains use the same hardware to increase synergies. This shared usage proves to be challenging for HEP groups, due to their specialized software setup which includes a custom OS (often Scientific Linux), libraries and applications.To overcome this hurdle, the EKP and data center team of the University of Freiburg have developed a system to enable the HEP use case on a shared HPC cluster. To achieve this, an OpenStack-based virtualization layer is installed on top of a bare-metal cluster. While other user groups can run their batch jobs via the Moab workload manager directly on bare-metal, HEP users can request virtual machines with a specialized machine image which contains a dedicated operating system and software stack. In contrast to similar installations, in this hybrid setup, no static partitioning of the cluster into a physical and virtualized segment is required.As a unique feature, the placement of the virtual machine on the cluster nodes is scheduled by Moab and the job lifetime is coupled to the lifetime of the virtual machine. This allows for a seamless integration with the jobs sent by other user groups and honors the fairshare policies of the cluster. The developed thin integration layer between OpenStack and Moab can be adapted to other batch servers and virtualization systems, making the concept also applicable for other cluster operators.This contribution will report on the concept and implementation of an OpenStack-virtualized cluster used for HEP workflows. While the full cluster will be installed in spring 2016, a test-bed setup with 800 cores has been used to study the overall system performance and dedicated HEP jobs were run in a virtualized environment over many weeks. Furthermore, the dynamic integration of the virtualized worker nodes, depending on the workload at the institute's computing system, will be described.

Highlights

  • Current and upcoming experiments in High Energy Physics (HEP) require large amounts of processing and storage capacity to handle the recorded data and to provide sufficient simulation and analysis capabilities to their scientists

  • The Institut fr Experimentelle Kernphysik (EKP) at KIT is a member of the Compact Muon Solenoid (CMS) and Belle II experiments, located at the LHC and the Super-KEKB accelerators, respectively

  • This shared usage proves to be challenging for high-energy physics (HEP) groups, due to their specialized software setup which includes a custom OS, libraries and applications

Read more

Summary

Introduction

Current and upcoming experiments in High Energy Physics (HEP) require large amounts of processing and storage capacity to handle the recorded data and to provide sufficient simulation and analysis capabilities to their scientists. Since users in the hybrid cluster are able to submit jobs to the scheduler for bare metal computation, it is required that the scheduler is responsible for the resources allocated by virtual machines. The scheduler handles a request for a new virtual machine like any other cluster job and is not required to have knowledge of the virtualization environment This makes the overall hybrid HPC cluster concept flexible and independent of the actual scheduler used. These values are read by the job script and mapped to a matching OpenStack flavor This way it is possible for the user to define the size of the virtual machine by the time he submits the job to the scheduler and the scheduler is aware of the resources used by the virtual machine. A flexible batch system as well as a central cloud manager are necessary

A Flexible Batch System
A Cloud Meta-Scheduler
Job Flow
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call