XRootD servers are commonplace to many parts of HEP data management and are a key component to data access and management strategies in both the WLCG and OSG. Deployments of XRootD instances across the UK have demonstrated the versatility and expandability of this data management software. As we become more reliant on these services, there is a requirement to collect low-level metrics to monitor service performance and behaviour. This presentation will share our recent experiences in the collection, monitoring and analysis of such metrics from servers at UK WLCG-Tier2 sites. Building on work presented at VCHEP-2021 we have been collecting XRootD service metrics from various UK sites. In addition to this, we have developed novel technologies for recording and verifying metrics from our grid storage systems at the Edinburgh Tier2. Our custom tooling is integrated with our recently deployed monitoring platform. This work complements ongoing WLCG efforts focussing on capturing and analysing XRootD based traffic flow. Through capturing and analysing additional metrics we’re able to better remotely assess service health and provide insights to assist in debugging. The goal of this is to offer a centralised resource that monitors the performance and health of remote storage services across the UK. Making use of these additional service metrics also allows us a greater insight into the performance of XRootD Proxy File Caches. By applying machine learning techniques to already collected metrics, we aim to determine if gains can potentially be made in optimising the behaviour of these caching systems in production at a UK WLCG-Tier2. The result of this potentially being the building and deployment of custom decision-making libraries for use with XRootDPFC services.
Read full abstract