Abstract

Within the ATLAS detector, the Trigger and Data Acquisition system is responsible for the online processing of data streamed from the detector during collisions at the Large Hadron Collider (LHC) at CERN. The online farm is composed of ~4000 servers processing the data read out from ~100 million detector channels through multiple trigger levels. The capability to monitor the ongoing data taking and all the involved applications is essential to debug and intervene promptly to ensure efficient data taking. The base of the current web service architecture was designed a few years ago, at the beginning of the ATLAS operation (Run 1). It was intended to serve primarily static content from a Network-attached Storage, and privileging strict security, using separate web servers for internal (ATLAS Technical and Control Network - ATCN) and external (CERN General Purpose Network and public internet) access. During these years, it has become necessary to add to the static content an increasing number of dynamic web-based User Interfaces, as they provided new functionalities and replaced legacy desktop UIs. These are typically served by applications on VMs inside ATCN and made accessible externally via chained reverse HTTP proxies. As the trend towards Web UIs continues, the current design has shown its limits, and its increasing complexity became an issue for maintenance and growth. It is, therefore, necessary to review the overall web services architecture for ATLAS, taking into account the current and future needs of the upcoming LHC Run 3. In this paper, we present our investigation and roadmap to re-design the web services system to better operate and monitor the ATLAS detector, while maintaining the security of critical services, such as Detector Control System, and maintaining the separation of remote monitoring and on-site control according to ATLAS policies.

Highlights

  • The online farm of the ATLAS [1] experiment at the Large Hadron Collider (LHC) consists of nearly 4000 servers with various characteristics processing the data readout from 100 million detector channels through multiple trigger levels

  • The access from General Purpose Network (GPN) to ATCN and viceversa is allowed via the ATLAS Gateway servers which are connected to both ATCN and GPN

  • It is necessary to ensure the security of critical services, such as Detector Control System (DCS), and to guarantee the separation of remote monitoring and on-site control, as required by ATLAS policies

Read more

Summary

Introduction

The online farm of the ATLAS [1] experiment at the Large Hadron Collider (LHC) consists of nearly 4000 servers with various characteristics processing the data readout from 100 million detector channels through multiple trigger levels. In case of a disconnection from GPN, ATLAS is meant to be able to continue to take data, up to a couple of days as limited by the local storage capacity. To address this requirement, all core services, such as Active Directory, DHCP, DNS, NTP, repositories, are duplicated inside ATCN. It is necessary to ensure the security of critical services, such as Detector Control System (DCS), and to guarantee the separation of remote monitoring and on-site control, as required by ATLAS policies. The architecture for the web services had been designed at the beginning of the ATLAS operation (Run 1 - 2009). An investigation has been started to re-design the web services system

Current web service architecture
Original requirements and design
Current difficulties and future needs
Proposed architecture
Advantages and disadvantages
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call