Abstract

The outbreak of the COVID-19 epidemic has focused enormous attention on the genetics of viral infection and related disease. Since the beginning of the pandemic, we focused on the collection and integration of SARS-CoV-2 databases, which contain information on the structure of the virus and on its ability to spread, mutate, and evolve; data are made available from several open-source databases. In the past, we gathered experience on human genomics data by building models and integrated databases of genomic datasets (representing, e.g., mutations, gene expression profiles, epigenetic signals). We also coordinated the development of a data dictionary describing the clinical phenotype of the COVID19 disease, in the context of a very large consortium. The main objective of this paper is to describe the content of the data dictionary and the process of data collection and organization. We also argue that—in the context of the COVID-19 disease—interoperability between the three domains of viral genomics, clinical phenotype, and human host genomics is essential for empowering important analysis processes and results. We call for actions that could be performed to link these data.

Highlights

  • Introduction and Viral Genetics DataBioMedThe outbreak of COVID-19 has presented novel challenges to the research community, pushed by the intent of rapidly mitigating the pandemic effects

  • In addition to the already cited cooperative efforts, we wish to mention the COVID19 Host Genetics Initiative [29], which aims at gathering an open community of thousands of researchers who produce, share, and analyze data to learn the genetic determinants of COVID-19 susceptibility, severity, and outcomes

  • One of the COVID-19 Host Genetics Initiative analyses discriminates between mild, severe, or critical COVID-19 disease severity based on a set of EncounterSymptoms and HospitalizationCourse conditions, whilst another analysis distinguishes cases and controls based on Comorbidities and AdmissionSymptoms

Read more

Summary

Background

We described an abstract model that allows representing both the data ( embedding the VCM) and the external knowledge that is being collected about SARS-CoV-2. This includes notions on variants, their effects (in terms of disease severity, transmissibility, vaccine escape, etc.), their composition (in terms of sets of mutations), the peculiarities of mutations due to their original and alternative nucleotide or amino acid residues, and the definition of particular regions of the genome with given functions. The results are produced as browsable tables of sequences and epitopes, described by their metadata They can be downloaded as textual files that are embedded in bioinformatic pipelines. The models and systems are general enough to consider many different signals of the human genome, including studies that may be useful to represent COVID-19-related problems

Clinical Aspects of COVID-19
The COVID-19 Phenotype Data Dictionary
Cooperative Construction of the Dictionary
Proposed Model
Host Genotype and Host Phenotype
Viral Genotype and Host Conditions
Findings
Discussion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.