Abstract

In order to ensure an optimal performance of the LHCb Distributed Computing, based on LHCbDIRAC, it is necessary to be able to inspect the behavior over time of many components: firstly the agents and services on which the infrastructure is built, but also all the computing tasks and data transfers that are managed by this infrastructure. This consists of recording and then analyzing time series of a large number of observables, for which the usage of SQL relational databases is far from optimal. Therefore within DIRAC we have been studying novel possibilities based on NoSQL databases (ElasticSearch, OpenTSDB and InfluxDB) as a result of this study we developed a new monitoring system based on ElasticSearch. It has been deployed on the LHCb Distributed Computing infrastructure for which it collects data from all the components (agents, services, jobs) and allows creating reports through Kibana and a web user interface, which is based on the DIRAC web framework. In this paper we describe this new implementation of the DIRAC monitoring system. We give details on the ElasticSearch implementation within the DIRAC general framework, as well as an overview of the advantages of the pipeline aggregation used for creating a dynamic bucketing of the time series. We present the advantages of using the ElasticSearch DSL high-level library for creating and running queries. Finally we shall present the performances of that system.

Highlights

  • ❍ Data format is key/value pairs defined by the Monitoring type:

  • ❍ Dedicated Plotter for each Monitoring type ❍ ReportGenerator based on DIRAC Graph library used to create the plots using the appropriate Plotter ❍ Plots are created on the service side using two level caching mechanisms:

  • ❏ DataCache: data used to create the plots kept in memory ❏ FileSystem: plots stored in the file system

Read more

Summary

DIRAC Monitoring system

❄ Not designed for real time monitoring (more for accounting) ❄ Can not manage semi structured data ❄ Not for real time analysis ❄ Does not scale to hundred millions rows (more than 500 million). ❄ Not easy to extract information ❄ Not user friendly ❄ Uses very old technology

DIRAC Monitoring System
Data storage
Plot creation within DIRAC framework
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.