CaosDB—Research Data Management for Complex, Changing, and Automated Research Workflows

Timm Fitschen,Daniel Hornung,Henrik Tom Wörden,Stefan Luther,Ulrich Parlitz,Alexander Schlemmer

doi:10.3390/data4020083

Abstract

We present CaosDB, a Research Data Management System (RDMS) designed to ensure seamless integration of inhomogeneous data sources and repositories of legacy data in a FAIR way. Its primary purpose is the management of data from biomedical sciences, both from simulations and experiments during the complete research data lifecycle. An RDMS for this domain faces particular challenges: research data arise in huge amounts, from a wide variety of sources, and traverse a highly branched path of further processing. To be accepted by its users, an RDMS must be built around workflows of the scientists and practices and thus support changes in workflow and data structure. Nevertheless, it should encourage and support the development and observation of standards and furthermore facilitate the automation of data acquisition and processing with specialized software. The storage data model of an RDMS must reflect these complexities with appropriate semantics and ontologies while offering simple methods for finding, retrieving, and understanding relevant data. We show how CaosDB responds to these challenges and give an overview of its data model, the CaosDB Server and its easy-to-learn CaosDB Query Language. We briefly discuss the status of the implementation, how we currently use CaosDB, and how we plan to use and extend it.

Highlights

Despite the technological advances over the last decades, the scientific community still faces the problem of storing and accessing scientific data in a structured and future-proof manner [1,2,3,4]. principles for good scientific data management have since been formulated under the acronym FAIR [5] and are widely recognized in the community, real-life obstacles tend to prevent their wide-spread adoption
Interface (API) must be built around a transparent human-readable protocol with RESTful2 identifiers [11]. This Application Programming Interface (API) can be used by libraries and clients that can be integrated into existing data management workflows
In this article we presented our approach to improve research data management in heterogeneous scientific environments

Summary

Introduction

Despite the technological advances over the last decades, the scientific community still faces the problem of storing and accessing scientific data in a structured and future-proof manner [1,2,3,4]. principles for good scientific data management have since been formulated under the acronym FAIR [5] and are widely recognized in the community, real-life obstacles tend to prevent their wide-spread adoption. Despite the technological advances over the last decades, the scientific community still faces the problem of storing and accessing scientific data in a structured and future-proof manner [1,2,3,4]. In cross-disciplinary environments, the interaction between different user groups, e.g., numerical scientists conducting simulation studies and experimenters working in the laboratory, often leads to highly inhomogeneous approaches to data management. For such heterogeneous data, inefficiencies become inevitable when different kinds of data have to be combined in a joint research project or when data has to be accessed by scientists who were not involved in the recording and storage procedure. This can lead to data being de facto inaccessible after their creators can no longer be reached.

Results

Discussion

Conclusion