Abstract

The revalidation, reinterpretation and reuse of research data analyses requires having access to the original computing environment, the experimental datasets, the analysis software, and the computational workflow steps which were used by researchers to produce the original scientific results in the first place. REANA (Reusable Analyses) is a nascent platform enabling researchers to structure their research data analyses in view of enabling future reuse. The analysis is described by means of a YAML file that captures sufficient information about the analysis assets, parameters and processes. The REANA platform consists of a set of micro-services allowing to launch and monitor container-based computational workflow jobs on the cloud. The REANA user interface and the command-line client enables researchers to easily rerun analysis workflows with new input parameters. The REANA platform aims at supporting several container technologies (Docker), workflow engines (CWL, Yadage), shared storage systems (Ceph, EOS) and compute cloud infrastructures (Ku-bernetes/OpenStack, HTCondor) used by the community. REANA was developed with the particle physics use case in mind and profits from synergies with general reusable research data analysis patterns in other scientific disciplines, such as bioinformatics and life sciences.

Highlights

  • The reproducibility of scientific results is crucial for advancing science and testing new theories and hypotheses

  • In this paper we describe the nascent REANA platform[6] that aims at facilitating reproducible science practices by leveraging on industry-standard container technologies useful for preserving and reinstantiating runtime environments

  • The full computational graphs in particle physics analyses can consist of several thousands nodes. Supports two of such systems; the Common Workflow Language standard [8] that emerged notably in the bioinformatics and the life science scientific domains, and Yadage workflow system [9] that was born in the particle physics community itself

Read more

Summary

Introduction

The reproducibility of scientific results is crucial for advancing science and testing new theories and hypotheses. In the computational data analysis domain, this means to capture the full information about input data and parameters, the analysis software, together with the operating system runtime environment[2] and the detailed computational steps and recipes how the researcher performed the analysis and produced the original results. It is necessary to preserve well-structured information about data, software, compute environments, and the associated analysis pipelines in order to facilitate future data reuse[4, 5]. In this paper we describe the nascent REANA platform[6] that aims at facilitating reproducible science practices by leveraging on industry-standard container technologies useful for preserving and reinstantiating runtime environments.

REANA platform
REANA client
REANA cluster
Examples
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call