Abstract

During the last few years, next-generation sequencing (NGS) technologies have accelerated the detection of genetic variants resulting in the rapid discovery of new disease-associated genes. However, the wealth of variation data made available by NGS alone is not sufficient to understand the mechanisms underlying disease pathogenesis and manifestation. Multidisciplinary approaches combining sequence and clinical data with prior biological knowledge are needed to unravel the role of genetic variants in human health and disease. In this context, it is crucial that these data are linked, organized, and made readily available through reliable online resources. The Swiss-Prot section of the Universal Protein Knowledgebase (UniProtKB/Swiss-Prot) provides the scientific community with a collection of information on protein functions, interactions, biological pathways, as well as human genetic diseases and variants, all manually reviewed by experts. In this article, we present an overview of the information content of UniProtKB/Swiss-Prot to show how this knowledgebase can support researchers in the elucidation of the mechanisms leading from a molecular defect to a disease phenotype.

Highlights

  • During the past decade, the widespread application of nextgeneration sequencing technologies [Shendure and Ji, 2008] to the study of human populations has accelerated the rate of identification of human genetic variants [1000 Genomes Project Consortium et al, 2012; Tennessen et al, 2012], establishing causal relationships between variants and disease phenotypes remains a major challenge

  • In UniProtKB/Swiss-Prot entries, we only indicate the current name “Joubert syndrome” as proposed by the OMIM database, we provide an exhaustive list of disease synonyms in the humdisease.txt file

  • In order to improve the clarity of disease information and to facilitate its retrieval from UniProtKB, the format of the subsection “Involvement in disease” is highly structured and written using standard phrases and controlled vocabulary

Read more

Summary

Introduction

The widespread application of nextgeneration sequencing technologies [Shendure and Ji, 2008] to the study of human populations has accelerated the rate of identification of human genetic variants [1000 Genomes Project Consortium et al, 2012; Tennessen et al, 2012], establishing causal relationships between variants and disease phenotypes remains a major challenge. Disease mutations could in principle occur in any functional region of the genome, most recent studies have utilized exome sequencing technologies to identify those affecting protein-coding regions (see, for instance, the NHLBI exome sequencing project at http://evs.gs.washington.edu/EVS/). In this context, high-quality resources linking genetic and medical information to protein sequences and associated biological knowledge, such as the manually curated section of the Universal Protein Knowledgebase (UniProtKB/Swiss-Prot), may be extremely valuable. Each UniProtKB/Swiss-Prot entry contains manually annotated protein sequence(s) encoded by one gene, and expert curated functional annotations, mostly gathered from the scientific literature

Objectives
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call