Abstract

The ALICE Experiment at CERN’s Large Hadron Collider (LHC) is undertaking a major upgrade during LHC Long Shutdown 2 in 2019-2021, which includes a new computing system called O2 (Online-Offline). To ensure the efficient operation of the upgraded experiment and of its newly designed computing system, a reliable, high performance, and automated experiment control system is being developed. The ALICE Experiment Control System (AliECS) is a distributed system based on state of the art cluster management and microservices that have recently emerged in the distributed computing ecosystem. Such technologies will allow the ALICE collaboration to benefit from a vibrant and innovating open source community. This communication describes the AliECS architecture. It provides an in-depth overview of the system’s components, features, and design elements, as well as its performance. It also reports on the experience with AliECS as part of ALICE Run 3 detector commissioning setups.

Highlights

  • 1.1 The O2 Computing SystemThe ALICE experiment [1] is undergoing a major upgrade [2] that is being deployed during the Large Hadron Collider (LHC)’s Long Shutdown 2 (2019-2021) in preparation for the LHC Run 3

  • Since synchronous workflows operate on data coming from detector data links, they must run in the O2 facility at the LHC Point 2

  • Asynchronous workflows do not have this constraint, so they can run at any time on WLCG nodes, or on O2 facility resources when they are not needed for synchronous operation

Read more

Summary

The O2 Computing System

The ALICE experiment [1] is undergoing a major upgrade [2] that is being deployed during the LHC’s Long Shutdown 2 (2019-2021) in preparation for the LHC Run 3. Since synchronous workflows operate on data coming from detector data links, they must run in the O2 facility at the LHC Point 2. Unlike FLPs, which host the first portion of the data flow, EPNs do not have physical links to detector hardware, and are instead configured as homogeneous computing nodes, operating as a second level of data processing after FLPs. While O2 is developed as a complete solution for the data processing needs of the ALICE experiment during Run 3, the O2 computing system is split up in two separate computing clusters due to significant differences in requirements between FLPs and EPNs. While O2 is developed as a complete solution for the data processing needs of the ALICE experiment during Run 3, the O2 computing system is split up in two separate computing clusters due to significant differences in requirements between FLPs and EPNs This partition yields the O2/FLP computing cluster and the O2/EPN computing cluster, both deployed at the LHC Point 2. The O2 project is an opportunity to take advantage of modern developments in computing; AliECS is built with the best practices of a microservices distributed application paradigm, and harnessing the features of modern cluster resource management solutions

Requirements of an ECS solution for ALICE Run 3
AliECS design overview
AliECS Components
AliECS Concepts
Configuration Management
O2 Process Control
AliECS in Run 3 detector commissioning
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call