PanDA Workload Management System Meta-data Segmentation

M Golosova,E Ryabinkin,A Klimentov,M Grigorieva

doi:10.1016/j.procs.2015.11.051

M Golosova, E Ryabinkin + Show 2 more

Open Access

https://doi.org/10.1016/j.procs.2015.11.051

Copy DOI

Abstract

Abstract The PanDA (Production and Distributed Analysis) workload management system (WMS) was developed to meet the scale and complexity of LHC distributed computing for the ATLAS experiment. PanDA currently distributes jobs among more than 100,000 cores at well over 120 Grid sites, supercomputing centers, commercial and academic clouds. ATLAS physicists submit more than 1.5 M data processing, simulation and analysis PanDA jobs per day, and the system keeps all meta-information about job submissions and execution events in Oracle RDBMS. The above information is used for monitoring and accounting purposes. One of the most challenging monitoring issues is tracking errors that has occurred during the execution of the jobs. Current meta-data storage technology doesn’t support inner tools for data aggregation, needed to build error summary tables, charts and graphs. Delegating these tasks to the monitor slows down the execution of requests. We will describe a project aimed at optimizing interaction between PanDA front-end and back-end, by meta-data storage segmentation into two parts – operational and archived. Active meta-data are remained in Oracle database (operational part), due to the high requirements for data integrity. Historical (read-only) meta-data used for the system analysis and accounting are exported to NoSQL storage (archived part). New data model based on usage of Cassandra as the NoSQL backend has been designed as a set of query-specific data structures. This allowed to remove most of data preparation workload from PanDA Monitor and improve its scalability and performance. Segmentation and synchronization between operational and archived parts of jobs meta-data is provided by a Hybrid Meta-data Storage Framework (HMSF). PanDA monitor was partly adopted to interact with HMSF. The operational data queries are forwarded to the primary SQL-based repository and the analytic data requests are processed by NoSQL database. The results of performance and scalability tests of HMSF-adopted part of PanDA Monitor shows that presented method of optimization, in conjunction with a properly configured NoSQL database and reasonable data model, provides performance improvements and scalability.

Full Text