Abstract
The new unified monitoring architecture (MONIT) for the CERN Data Centres and for the WLCG Infrastructure is based on established open source technologies to collect, stream, store and access monitoring data. The previous solutions, based on in-house development and commercial software, have been replaced with widely- recognized technologies such as Collectd, Kafka, Spark, Elasticsearch, InfluxDB, Grafana and others. The monitoring infrastructure, fully based on CERN cloud resources, covers the whole workflow of the monitoring data: from collecting and validating metrics and logs to making them available for dashboards, reports and alarms. The deployment in production of this new DC and WLCG monitoring is well under way and this contribution provides a summary of the progress, hurdles met and lessons learned in using these open source technologies. It also focuses on the choices made to achieve the required levels of stability, scalability and performance of the MONIT monitoring service.
Highlights
The CERN Data Centres (DC) and the Worldwide Large Hadron Collider (LHC) Computing Grid (WLCG) [1] have been monitored, for more than a decade, with in-house central solutions gathering and storing in the CERN storage facilities a large amount of metrics and logging information
The monitoring of the CERN Data Centres, in Geneva and in Wigner, has been using LEMON (LHC Era Monitoring for Large-Scale Infrastructure) [2] a client/server-based monitoring system developed in the Laboratory
The monitoring architecture (MONIT) infrastructure provides a set of standard ways to integrate data from different protocols and technologies, both from the Data Centres and the Worldwide LHC Computing Grid (WLCG) Grid resources
Summary
The CERN Data Centres (DC) and the Worldwide LHC Computing Grid (WLCG) [1] have been monitored, for more than a decade, with in-house central solutions gathering and storing in the CERN storage facilities a large amount of metrics and logging information. Both monitoring systems have been developed separately, supported by different software teams at CERN and with a limited amount of common software components. Such information, typically metrics and status of the services running on that node, is forwarded to a central repository where data are curated, stored and displayed in quasi real time. At the end of 2015, it was decided to take advantage of these major refactoring activities of the two monitoring services and, after merging the two CERN teams, aiming at a new single Unified Monitoring service satisfying both Data Centres and WLCG Infrastructure requirements
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.