Common data model for COVID-19 datasets.

Philipp Wegner,Geena Mariya Jose,Sepehr Golriz Khatami,Bide Zhang,Alpha Tom Kodamullil,Martin Hofmann-Apitius,Marc Jacobs,Stephan Springstubbe,Thomas Linden,Bruce Schultz,Vanessa Lage-Rupprecht,Cindy Ku

doi:10.1093/bioinformatics/btac651

Philipp Wegner, Geena Mariya Jose + Show 10 more

Open Access

https://doi.org/10.1093/bioinformatics/btac651

Copy DOI

Abstract

A global medical crisis like the coronavirus disease 2019 (COVID-19) pandemic requires interdisciplinary and highly collaborative research from all over the world. One of the key challenges for collaborative research is a lack of interoperability among various heterogeneous data sources. Interoperability, standardization and mapping of datasets are necessary for data analysis and applications in advanced algorithms such as developing personalized risk prediction modeling. To ensure the interoperability and compatibility among COVID-19 datasets, we present here a common data model (CDM) which has been built from 11 different COVID-19 datasets from various geographical locations. The current version of the CDM holds 4639 data variables related to COVID-19 such as basic patient information (age, biological sex and diagnosis) as well as disease-specific data variables, for example, Anosmia and Dyspnea. Each of the data variables in the data model is associated with specific data types, variable mappings, value ranges, data units and data encodings that could be used for standardizing any dataset. Moreover, the compatibility with established data standards like OMOP and FHIR makes the CDM a well-designed CDM for COVID-19 data interoperability. The CDM is available in a public repo here: https://github.com/Fraunhofer-SCAI-Applied-Semantics/COVID-19-Global-Model. Supplementary data are available at Bioinformatics online.

Full Text