ClinEpiDB: an open-access clinical epidemiology database resource encouraging online exploration of complex studies.

Emmanuel Ruhamyankaka,Sheena Shah Tomko,Brianna Lindsay,Omar S Harb,John Judkins,Grant Dorsey,Emmanuel James San,David S Roos,Christian J Stoeckert,Jie Zheng,Jessica C Kissinger,Danica A Helb,Brian P Brunk

doi:10.12688/gatesopenres.13087.2

Abstract

The concept of open data has been gaining traction as a mechanism to increase data use, ensure that data are preserved over time, and accelerate discovery. While epidemiology data sets are increasingly deposited in databases and repositories, barriers to access still remain. ClinEpiDB was constructed as an open-access online resource for clinical and epidemiologic studies by leveraging the extensive web toolkit and infrastructure of the Eukaryotic Pathogen Database Resources (EuPathDB; a collection of databases covering 170+ eukaryotic pathogens, relevant related species, and select hosts) combined with a unified semantic web framework. Here we present an intuitive point-and-click website that allows users to visualize and subset data directly in the ClinEpiDB browser and immediately explore potential associations. Supporting study documentation aids contextualization, and data can be downloaded for advanced analyses. By facilitating access and interrogation of high-quality, large-scale data sets, ClinEpiDB aims to spur collaboration and discovery that improves global health.

Highlights

Large-scale epidemiological data sets offer immense potential for secondary data discovery and translational research provided the data are Findable, Accessible, Interoperable, and Reusable (FAIR) (Wilkinson et al, 2016)
The Clinical and Epidemiology Database (ClinEpiDB) resource was developed within this landscape as an open-access online tool to help investigators quickly and explore data from complex epidemiological studies and distinguishes itself from repositories and studyspecific websites in two key ways: 1) ClinEpiDB maps data to common ontologies, creating a unified semantic framework that applies to all integrated studies, even those with different disease foci
Journals and funders increasingly require that data be made publicly available (National Institutes of Health, 2003; The Wellcome Trust, 2011), but data hidden in supplementary data files or stored in data repositories are often difficult to locate, interpret, or use by those not actively engaged in the study

Summary

Introduction

Large-scale epidemiological data sets offer immense potential for secondary data discovery and translational research provided the data are Findable, Accessible, Interoperable, and Reusable (FAIR) (Wilkinson et al, 2016). Data repositories such as Dryad, dbGaP, and to a more limited extent ICPSR support the deposition of epidemiology data and metadata for download and secondary use by other researchers. The Clinical and Epidemiology Database (ClinEpiDB) resource was developed within this landscape as an open-access online tool to help investigators quickly and explore data from complex epidemiological studies and distinguishes itself from repositories and studyspecific websites in two key ways: 1) ClinEpiDB maps data to common ontologies, creating a unified semantic framework that applies to all integrated studies, even those with different disease foci. A distinguishing feature of ClinEpiDB is that tools and visualizations are available for all studies, and aggregate data is generally publicly accessible

Methods

Results

Conclusion