Abstract

The WLCG monitoring system solves a challenging task of keeping track of the LHC computing activities on the WLCG infrastructure, ensuring health and performance of the distributed services at more than 170 sites. The challenge consists of decreasing the effort needed to operate the monitoring service and to satisfy the constantly growing requirements for its scalability and performance. This contribution describes the recent consolidation work aimed to reduce the complexity of the system, and to ensure more effective operations, support and service management. This was done by unifying where possible the implementation of the monitoring components. The contribution also covers further steps like the evaluation of the new technologies for data storage, processing and visualization and migration to a new technology stack.

Highlights

  • The Worldwide LHC Computing Grid (WLCG) [1] is a global collaboration of more than 170 computing centres in 41 countries, linking up national and international grid infrastructures

  • Content is rendered using different technologies: XSLT, HTML skeleton, Django templates It was recommended to have a separation between the server side and the client side as the main design principle for the visualization of the WLCG monitoring applications[12]

  • In eighteen months, the WLCG Monitoring Consolidation achieved a great reduction of the effort required to maintain the WLCG monitoring applications

Read more

Summary

Introduction

The Worldwide LHC Computing Grid (WLCG) [1] is a global collaboration of more than 170 computing centres in 41 countries, linking up national and international grid infrastructures. Monitoring the status of the sites and services of this distributed infrastructure is a critical and nontrivial task. Several monitoring solutions had been implemented to provide views of different areas. The situation on June 2013 was that the effort required to support and maintain the WLCG monitoring solutions was bigger than the expected size of the team who would do it. The project had to perform a critical analysis of what was monitored, the technologies used and the deployment and support models, and to propose and implement a technical solution that would offer similar quality of service with a reduced effort. The main objective of the project was to get to the point where the effort required for WLCG monitoring can be reduced to half of its initial level. During the first phase (July to October 2013), the group performed an analysis of the initial status and agreed on a plan to follow during the second phase (October 2013 to December 2014)

Initial situation
Aggregation There were two main common use cases for data processing:
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call