WLCG Tier-2 Research Articles

Software development projects at Edinburgh identified a desire to build and manage our own monitoring platform. This better allows us to support the developing and varied physics and computing interests of our Experimental Particle Physics group. This production platform enables oversight of international experimental data management, local software development projects and active monitoring of lab facilities within our research group. Larger sites such as CERN have access to many resources to support generalpurpose centralised monitoring solutions such as MONIT. At a WLCG Tier2 we only have access to a fraction of these resources and manpower. Recycling nodes from grid storage and borrowed capacity from our Tier2 Hypervisors has enabled us to build a reliable monitoring infrastructure. This also contributes back to our Tier2 management improving our operational and security monitoring. Shared experiences from larger sites gave us a head-start in building our own service monitoring (FluentD) and multi-protocol (AMQP/STOMP/UDP datagram) messaging frameworks atop both our Elasticsearch and OpenSearch clusters. This has been built with minimal hardware and software complexity, maximising maintainability, and reducing manpower costs. A secondary design goal has also been the ability to migrate and upgrade individual components with minimal service interruption. To achieve this, we made heavy use of different layers of containerisation (Podman/Docker), virtualization and NGINX web proxies. This presentation details our experiences in developing this platform from scratch with a focus on minimal resource use. This includes lessons learnt in deploying and comparing both an Elasticsearch and OpenSearch clusters, as well as designing various levels of automation and resiliency for our monitoring framework. This has culminated in us effectively indexing, parsing and storing >200GB of logging and monitoring data per day.

Read full abstract

The Computing Center of the Institute of Physics (CC IoP) of the Czech Academy of Sciences serves a broad spectrum of users with various computing needs. It runs WLCG Tier-2 center for the ALICE and the ATLAS experiments; the same group of services is used by astroparticle physics projects the Pierre Auger Observatory (PAO) and the Cherenkov Telescope Array (CTA). OSG stack is installed for the NOvA experiment. Other groups of users use directly local batch system. Storage capacity is distributed to several locations. DPM servers used by the ATLAS and the PAO are all in the same server room, but several xrootd servers for the ALICE experiment are operated in the Nuclear Physics Institute in Řež, about 10 km away. The storage capacity for the ATLAS and the PAO is extended by resources of the CESNET - the Czech National Grid Initiative representative. Those resources are in Plzen and Jihlava, more than 100 km away from the CC IoP. Both distant sites use a hierarchical storage solution based on disks and tapes. They installed one common dCache instance, which is published in the CC IoP BDII. ATLAS users can use these resources using the standard ATLAS tools in the same way as the local storage without noticing this geographical distribution. Computing clusters LUNA and EXMAG dedicated to users mostly from the Solid State Physics departments offer resources for parallel computing. They are part of the Czech NGI infrastructure MetaCentrum with distributed batch system based on torque with a custom scheduler. Clusters are installed remotely by the MetaCentrum team and a local contact helps only when needed. Users from IoP have exclusive access only to a part of these two clusters and take advantage of higher priorities on the rest (1500 cores in total), which can also be used by any user of the MetaCentrum. IoP researchers can also use distant resources located in several towns of the Czech Republic with a capacity of more than 12000 cores in total.

Read full abstract

WLCG Tier-2 Research Articles

Related Topics

Articles published on WLCG Tier-2

WLCG Tier-2 Computing Center at NRC “Kurchatov Institute”—IHEP: 20 Years of Operation

Building a Flexible and Resource-Light Monitoring Platform for a WLCG-Tier2

Czech national e-infrastructure services for HEP

A Blueprint for a Contemporary Storage Element, building a new WLCG storage system with widely available hardware and software components: Ceph, XRootD, and Prometheus

Provision and use of GPU resources for distributed workloads via the Grid

Distributed resources of Czech WLCG Tier-2 center

Enabling ATLAS big data processing on Piz Daint at CSCS

Using the Autopilot pattern to deploy container resources at a WLCG Tier-2

IPv6 in production: its deployment and usage in WLCG

Using ZFS to manage Grid storage and improve middleware resilience

A container model for resource provision at a WLCG Tier-2

Stealth Cloud: How not to waste CPU during grid to cloud transitions

A multipurpose computing center with distributed resources

Evaluation of ZFS as an efficient WLCG storage backend

Optimisation of the usage of LHC and local computing resources in a multidisciplinary physics department hosting a WLCG Tier-2 centre

Managing competing elastic Grid and Cloud scientific computing applications using OpenNebula

Enabling IPv6 at FZU - WLCG Tier2 in Prague

A Voyage to Arcturus: A model for automated management of a WLCG Tier-2 facility

Implementation of Grid Tier 2 and Tier 3 facilities on a Distributed OpenStack Cloud

Integrating multiple scientific computing needs via a Private Cloud infrastructure

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

WLCG Tier-2 Research Articles

Related Topics

Articles published on WLCG Tier-2

WLCG Tier-2 Computing Center at NRC “Kurchatov Institute”—IHEP: 20 Years of Operation

Building a Flexible and Resource-Light Monitoring Platform for a WLCG-Tier2

Czech national e-infrastructure services for HEP

A Blueprint for a Contemporary Storage Element, building a new WLCG storage system with widely available hardware and software components: Ceph, XRootD, and Prometheus

Provision and use of GPU resources for distributed workloads via the Grid

Distributed resources of Czech WLCG Tier-2 center

Enabling ATLAS big data processing on Piz Daint at CSCS

Using the Autopilot pattern to deploy container resources at a WLCG Tier-2

IPv6 in production: its deployment and usage in WLCG

Using ZFS to manage Grid storage and improve middleware resilience

A container model for resource provision at a WLCG Tier-2

Stealth Cloud: How not to waste CPU during grid to cloud transitions

A multipurpose computing center with distributed resources

Evaluation of ZFS as an efficient WLCG storage backend

Optimisation of the usage of LHC and local computing resources in a multidisciplinary physics department hosting a WLCG Tier-2 centre

Managing competing elastic Grid and Cloud scientific computing applications using OpenNebula

Enabling IPv6 at FZU - WLCG Tier2 in Prague

A Voyage to Arcturus: A model for automated management of a WLCG Tier-2 facility

Implementation of Grid Tier 2 and Tier 3 facilities on a Distributed OpenStack Cloud

Integrating multiple scientific computing needs via a Private Cloud infrastructure