A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses

Andra Waagmeester,Egon L Willighagen,Andrew I Su,Martina Kutmon,Jose Emilio Labra Gayo,Daniel Fernández-Álvarez,Quentin Groom,Peter J Schaap,Lisa M Verhagen,Jasper J Koehorst

doi:10.1186/s12915-020-00940-y

Abstract

BackgroundPandemics, even more than other medical problems, require swift integration of knowledge. When caused by a new virus, understanding the underlying biology may help finding solutions. In a setting where there are a large number of loosely related projects and initiatives, we need common ground, also known as a “commons.” Wikidata, a public knowledge graph aligned with Wikipedia, is such a commons and uses unique identifiers to link knowledge in other knowledge bases. However, Wikidata may not always have the right schema for the urgent questions. In this paper, we address this problem by showing how a data schema required for the integration can be modeled with entity schemas represented by Shape Expressions.ResultsAs a telling example, we describe the process of aligning resources on the genomes and proteomes of the SARS-CoV-2 virus and related viruses as well as how Shape Expressions can be defined for Wikidata to model the knowledge, helping others studying the SARS-CoV-2 pandemic. How this model can be used to make data between various resources interoperable is demonstrated by integrating data from NCBI (National Center for Biotechnology Information) Taxonomy, NCBI Genes, UniProt, and WikiPathways. Based on that model, a set of automated applications or bots were written for regular updates of these sources in Wikidata and added to a platform for automatically running these updates.ConclusionsAlthough this workflow is developed and applied in the context of the COVID-19 pandemic, to demonstrate its broader applicability it was also applied to other human coronaviruses (MERS, SARS, human coronavirus NL63, human coronavirus 229E, human coronavirus HKU1, human coronavirus OC4).

Highlights

Pandemics, even more than other medical problems, require swift integration of knowledge
With Shape Expressions (ShEx), we describe the Resource Description Framework (RDF) structure by which Wikidata content is made available
Using the existing Wikidata infrastructure, we developed semantic schemas for virus strains, genes, and proteins; bots written in Python to add knowledge on genes and proteins of the seven human coronaviruses and linked them to biological pathways in WikiPathways and to primary literature, visualized in Scholia

Summary

Introduction

Even more than other medical problems, require swift integration of knowledge. The Dutch universities went a Waagmeester et al BMC Biology (2021) 19:12 step further and want to make any previously published research openly available, in whatever way related to COVID-19 research This swift release of research findings comes with an increased number of incorrect interpretations [8] which can be problematic when new research articles are picked up by main-stream media [9]. Rapid evaluation of these new research findings and integration with existing resources requires frictionless access to the underlying research data upon which the findings are based. Is a particular gene or protein already described in Wikidata? Using a shared interoperability layer, like Wikidata, different resources can be more linked

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Biology	Publication Date: Jan 22, 2021
Citations: 17	License type: open-access

R Discovery Prime

R Discovery Prime

A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Biology

Lead the way for us

Similar Papers

Broadly recognized, cross-reactive SARS-CoV-2 CD4 T cell epitopes are highly conserved across human coronaviruses and presented by common HLA alleles.
Aniuska Becerra-Artiles ... Lawrence J Stern
Cell reports | VOL. 39
Aniuska Becerra-Artiles, et. al.Aniuska Becerra-Artiles ... Lawrence J Stern
27 May 2022
Cell reports | VOL. 39

Anorectal myectomy in treatment of ultrashort segment Hirschsprung's disease. Report of 26 cases.
W G Scobie ... G A Mackinlay
Archives of disease in childhood | VOL. 52
W G Scobie, et. al.W G Scobie ... G A Mackinlay
01 Sep 1977
Archives of disease in childhood | VOL. 52

Structure-function relationships among selected human coronaviruses
-
Indian Journal of Biochemistry and Biophysics | VOL. -
--
01 Jan 2021
Indian Journal of Biochemistry and Biophysics | VOL. -

Epidemiology and clinical characteristics of human coronaviruses OC43, 229E, NL63, and HKU1: a study of hospitalized children with acute respiratory tract infection in Guangzhou, China
Zhi-Qi Zeng ... De-Hui Chen
European Journal of Clinical Microbiology & Infectious Diseases | VOL. 37
Zhi-Qi Zeng, et. al.Zhi-Qi Zeng ... De-Hui Chen
06 Dec 2017
European Journal of Clinical Microbiology & Infectious Diseases | VOL. 37

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Biology