Abstract
In particle physics, workflow management systems are primarily used as tailored solutions in dedicated areas such as Monte Carlo event generation. However, physicists performing data analyses are usually required to steer their individual workflows manually, which is time-consuming and often leads to undocumented relations between particular workloads. We present the Luigi Analysis Workflows (Law) Python package, which is based on the open-source pipelining tool Luigi, originally developed by Spotify. It establishes a generic design pattern for analyses of arbitrary scale and complexity, and shifts the focus from executing to defining the analysis logic. Law provides the building blocks to seamlessly integrate interchangeable remote resources without, however, limiting itself to a specific choice of infrastructure. In particular, it encourages and enables the separation of analysis algorithms on the one hand, and run locations, storage locations, and software environments on the other hand. To cope with the sophisticated demands of end-to-end HEP analyses, Law supports job execution on WLCG infrastructure (ARC, gLite) as well as on local computing clusters (HTCondor, LSF), remote file access via most common protocols through the GFAL2 library, and an environment sandboxing mechanism with support for Docker and Singularity containers. Moreover, the novel approach ultimately aims for analysis preservation out-of-the-box. Law is entirely experiment independent and developed open-source.
Highlights
The management of scientific workflows presents a complex challenge in today’s physics working environments
A novel design pattern for physics analyses conception and automation that copes with the challenges of inhomogeneous workload definition and risks due to manual steering is presented
The presented guidelines and tools for generic analyses conception constitute a novel approach for coping with the increasing demands of modern high-energy physics data analysis
Summary
The management of scientific workflows presents a complex challenge in today’s physics working environments. The interface between these workloads does not rely on an event-by-event data flow They rather form a loose collection of inhomogeneous procedures, encoded in executable files such as Shell and Python scripts, and are executed manually. A novel design pattern for physics analyses conception and automation that copes with the challenges of inhomogeneous workload definition and risks due to manual steering is presented. It is based on the pipelining package Luigi [1] due to its simple yet scalable and extensible design, providing guidance on structuring arbitrary workloads. Whereas central experiment workflows often rely on dedicated infrastructure, analyses must rather incorporate existing resources and maintain the ability to adapt to short-term changes
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.