REANA: A System for Reusable Research Data Analyses

Tibor Šimko,Diego Rodríguez,Dinos Kousidis,Lukas Heinrich,Harri Hirvonsalo

doi:10.1051/epjconf/201921406034

Abstract

The revalidation, reinterpretation and reuse of research data analyses requires having access to the original computing environment, the experimental datasets, the analysis software, and the computational workflow steps which were used by researchers to produce the original scientific results in the first place. REANA (Reusable Analyses) is a nascent platform enabling researchers to structure their research data analyses in view of enabling future reuse. The analysis is described by means of a YAML file that captures sufficient information about the analysis assets, parameters and processes. The REANA platform consists of a set of micro-services allowing to launch and monitor container-based computational workflow jobs on the cloud. The REANA user interface and the command-line client enables researchers to easily rerun analysis workflows with new input parameters. The REANA platform aims at supporting several container technologies (Docker), workflow engines (CWL, Yadage), shared storage systems (Ceph, EOS) and compute cloud infrastructures (Ku-bernetes/OpenStack, HTCondor) used by the community. REANA was developed with the particle physics use case in mind and profits from synergies with general reusable research data analysis patterns in other scientific disciplines, such as bioinformatics and life sciences.

Highlights

The reproducibility of scientific results is crucial for advancing science and testing new theories and hypotheses
In this paper we describe the nascent REANA platform[6] that aims at facilitating reproducible science practices by leveraging on industry-standard container technologies useful for preserving and reinstantiating runtime environments
The full computational graphs in particle physics analyses can consist of several thousands nodes. Supports two of such systems; the Common Workflow Language standard [8] that emerged notably in the bioinformatics and the life science scientific domains, and Yadage workflow system [9] that was born in the particle physics community itself

Summary

Introduction

The reproducibility of scientific results is crucial for advancing science and testing new theories and hypotheses. In the computational data analysis domain, this means to capture the full information about input data and parameters, the analysis software, together with the operating system runtime environment[2] and the detailed computational steps and recipes how the researcher performed the analysis and produced the original results. It is necessary to preserve well-structured information about data, software, compute environments, and the associated analysis pipelines in order to facilitate future data reuse[4, 5]. In this paper we describe the nascent REANA platform[6] that aims at facilitating reproducible science practices by leveraging on industry-standard container technologies useful for preserving and reinstantiating runtime environments.

REANA platform

REANA client

REANA cluster

Examples

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EPJ Web of Conferences	Publication Date: Jan 1, 2019
Citations: 27	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

REANA: A System for Reusable Research Data Analyses

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ Web of Conferences

Lead the way for us

Similar Papers

Utilizing Heterogeneous Data Sources in Computational Grid Workflows
Tamas Kiss ... Peter Kacsuk
-
Tamas Kiss, et. al.Tamas Kiss ... Peter Kacsuk
01 Jan 2008
01 Jan 2008

Safety, Security, and Ethics in the Biological Realm are a Multi-disciplinary Challenge—That Begins and Ends with the Individual
Terence T Taylor ... Kauser Malik
Applied Biosafety | VOL. 17
Terence T Taylor, et. al.Terence T Taylor ... Kauser Malik
01 Sep 2012
Applied Biosafety | VOL. 17

Study of cache performance in distributed environment for data processing
Dzmitry Makatun ... Jérôme Lauret
Journal of Physics: Conference Series | VOL. 523
Dzmitry Makatun, et. al.Dzmitry Makatun ... Jérôme Lauret
06 Jun 2014
Journal of Physics: Conference Series | VOL. 523

The impact of digital dissemination for research and scholarship.

Ecancermedicalscience | VOL. 8

16 Sep 2014
Ecancermedicalscience | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

REANA: A System for Reusable Research Data Analyses

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ Web of Conferences