Abstract

In particle physics, workflow management systems are primarily used as tailored solutions in dedicated areas such as Monte Carlo event generation. However, physicists performing data analyses are usually required to steer their individual workflows manually, which is time-consuming and often leads to undocumented relations between particular workloads. We present the Luigi Analysis Workflows (Law) Python package, which is based on the opensource pipelining tool Luigi, originally developed by Spotify. It establishes a generic design pattern for analyses of arbitrary scale and complexity, and shifts the focus from executing to defining the analysis logic. Law provides the building blocks to seamlessly integrate interchangeable remote resources without, however, limiting itself to a specific choice of infrastructure. In particular, it encourages and enables the separation of analysis algorithms on the one hand, and run locations, storage locations, and software environments on the other hand. To cope with the sophisticated demands of end-to-end HEP analyses, Law supports job execution on WLCG infrastructure (ARC, gLite) as well as on local computing clusters (HTCondor, LSF), remote file access via most common protocols through the GFAL2 library, and an environment sandboxing mechanism with support for Docker and Singularity containers. Moreover, the novel approach ultimately aims for analysis preservation out-of-the-box. Law is entirely experiment independent and developed open-source. It is successfully used in tt̄H cross section measurements and searches for di-Higgs boson production with the CMS experiment.

Highlights

  • The management of scientific workflows presents a complex challenge in today’s physics working environments

  • A novel design pattern for physics analyses conception and automation that copes with the challenges of inhomogeneous workload definition and risks due to manual steering is presented

  • The presented guidelines and tools for generic analyses conception constitute a novel approach for coping with the increasing demands of modern high-energy physics data analysis

Read more

Summary

Introduction

The management of scientific workflows presents a complex challenge in today’s physics working environments. The interface between these workloads does not rely on an event-by-event data flow They rather form a loose collection of inhomogeneous procedures, encoded in executable files such as Shell and Python scripts, and are executed manually. A novel design pattern for physics analyses conception and automation that copes with the challenges of inhomogeneous workload definition and risks due to manual steering is presented. It is based on the pipelining package Luigi [1] due to its simple yet scalable and extensible design, providing guidance on structuring arbitrary workloads. Whereas central experiment workflows often rely on dedicated infrastructure, analyses must rather incorporate existing resources and maintain the ability to adapt to short-term changes

Guidelines
Analysis Workflow Management
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.