Abstract

The primary goal of the online cluster of the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) is to build event data from the detector and to select interesting collisions in the High Level Trigger (HLT) farm for offline storage. With more than 1500 nodes and a capacity of about 850 kHEPSpecInt06, the HLT machines represent similar computing capacity of all the CMS Tier1 Grid sites together. Moreover, it is currently connected to the CERN IT datacenter via a dedicated 160 Gbps network connection and hence can access the remote EOS based storage with a high bandwidth. In the last few years, a cloud overlay based on OpenStack has been commissioned to use these resources for the WLCG when they are not needed for data taking. This online cloud facility was designed for parasitic use of the HLT, which must never interfere with its primary function as part of the DAQ system. It also allows to abstract from the different types of machines and their underlying segmented networks. During the LHC technical stop periods, the HLT cloud is set to its static mode of operation where it acts like other grid facilities. The online cloud was also extended to make dynamic use of resources during periods between LHC fills. These periods are a-priori unscheduled and of undetermined length, typically of several hours, once or more a day. For that, it dynamically follows LHC beam states and hibernates Virtual Machines (VM) accordingly. Finally, this work presents the design and implementation of a mechanism to dynamically ramp up VMs when the DAQ load on the HLT reduces towards the end of the fill.

Highlights

  • Deployed on the facility for the Compact Muon Solenoid experiment (CMS) at the Large Hadron Collider (LHC), the CMS Online Cluster provides the computing power needed for the data acquisition and selection of interesting collision events for later offline storage and analysis

  • The Cloud-collect is a simple daemon that deals with different monitoring sources to collect information such as the LHC Beam state, the CMS DAQ status, DAQ average CPU load, etc., and it stores this information on MariaDB

  • Adapting to DAQ infrastructure changes: since the Online Cloud does parasitic use of the DAQ-High Level Trigger (HLT) resources, it must be provided on top of what is in use for the data taking for what concerns the Operating System, Networking and Computing hardware

Read more

Summary

Introduction

Deployed on the facility for the Compact Muon Solenoid experiment (CMS) at the Large Hadron Collider (LHC), the CMS Online Cluster provides the computing power needed for the data acquisition and selection of interesting collision events for later offline storage and analysis. The High Level Trigger (HLT) farm is the biggest resource of this cluster, having more than 1500 nodes and summing up to 850 kHEPSpecInt of CPU capacity, for a total of 37k physical cores (74k virtual cores if counting hyper-threads). This is comparable to the amount of processing resources provided by all the CMS Tier Grid sites together. On our previous work [1], we’ve shown how to build a cloud overlay on top of the HLT farm capable of running Grid jobs It provided proper isolation but a solution for fast turnaround and job resuming was still being designed. We’ll show the contribution it made to the CMS offline data processing

The Online Cloud overlay
DAQ and Cloud daemons
VM hibernation and resuming
Cloud operation modes
Contribution to CMS offline processing
Future work
Opportunistic usage of the CMS Online cluster using a cloud overlay

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.