Abstract
The revalidation, reinterpretation and reuse of research data analyses requires having access to the original computing environment, the experimental datasets, the analysis software, and the computational workflow steps which were used by researchers to produce the original scientific results in the first place. REANA (Reusable Analyses) is a nascent platform enabling researchers to structure their research data analyses in view of enabling future reuse. The analysis is described by means of a YAML file that captures sufficient information about the analysis assets, parameters and processes. The REANA platform consists of a set of micro-services allowing to launch and monitor container-based computational workflow jobs on the cloud. The REANA user interface and the command-line client enables researchers to easily rerun analysis workflows with new input parameters. The REANA platform aims at supporting several container technologies (Docker), workflow engines (CWL, Yadage), shared storage systems (Ceph, EOS) and compute cloud infrastructures (Ku-bernetes/OpenStack, HTCondor) used by the community. REANA was developed with the particle physics use case in mind and profits from synergies with general reusable research data analysis patterns in other scientific disciplines, such as bioinformatics and life sciences.
Highlights
The reproducibility of scientific results is crucial for advancing science and testing new theories and hypotheses
In this paper we describe the nascent REANA platform[6] that aims at facilitating reproducible science practices by leveraging on industry-standard container technologies useful for preserving and reinstantiating runtime environments
The full computational graphs in particle physics analyses can consist of several thousands nodes. Supports two of such systems; the Common Workflow Language standard [8] that emerged notably in the bioinformatics and the life science scientific domains, and Yadage workflow system [9] that was born in the particle physics community itself
Summary
The reproducibility of scientific results is crucial for advancing science and testing new theories and hypotheses. In the computational data analysis domain, this means to capture the full information about input data and parameters, the analysis software, together with the operating system runtime environment[2] and the detailed computational steps and recipes how the researcher performed the analysis and produced the original results. It is necessary to preserve well-structured information about data, software, compute environments, and the associated analysis pipelines in order to facilitate future data reuse[4, 5]. In this paper we describe the nascent REANA platform[6] that aims at facilitating reproducible science practices by leveraging on industry-standard container technologies useful for preserving and reinstantiating runtime environments.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.