Abstract

In the Grid world, there are many tools for monitoring both activities and infrastructure. The huge amount of information available needs to be well organized, especially considering the pressing need for prompt reaction in case of problems impacting the activities of a large Virtual Organization. Such activities include data taking, data reconstruction, data reprocessing and user analysis. The monitoring system for the LHCb Grid Computing relies on many heterogeneous and independent sources of information. These offers different views for a better understanding of problems, while an operations team follow defined procedures that have been put in place to handle them. This work summarizes the state-of-the-art of LHCb Grid operations, emphasizing the reasons that brought to various choices, and what are the tools currently in use to run our daily activities. We highlight the most common problems experienced across years of activities on the WLCG infrastructure, the services with their criticality, the procedures in place, the relevant metrics, the tools available and the ones still missing.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call