Abstract

SelenoDB (http://www.selenodb.org) aims to provide high-quality annotations of selenoprotein genes, proteins and SECIS elements. Selenoproteins are proteins that contain the amino acid selenocysteine (Sec) and the first release of the database included annotations for eight species. Since the release of SelenoDB 1.0 many new animal genomes have been sequenced. The annotations of selenoproteins in new genomes usually contain many errors in major databases. For this reason, we have now fully annotated selenoprotein genes in 58 animal genomes. We provide manually curated annotations for human selenoproteins, whereas we use an automatic annotation pipeline to annotate selenoprotein genes in other animal genomes. In addition, we annotate the homologous genes containing cysteine (Cys) instead of Sec. Finally, we have surveyed genetic variation in the annotated genes in humans. We use exon capture and resequencing approaches to identify single-nucleotide polymorphisms in more than 50 human populations around the world. We thus present a detailed view of the genetic divergence of Sec- and Cys-containing genes in animals and their diversity in humans. The addition of these datasets into the second release of the database provides a valuable resource for addressing medical and evolutionary questions in selenium biology.

Highlights

  • Selenoproteins are proteins that contain the amino acid selenocysteine (Sec) as one of their constituent residues.Sec, the 21st amino acid in the genetic code, is analogous to the amino acid cysteine (Cys) in its molecular structure with an atom of selenium replacing that of sulfur in Cys

  • The dual and seemingly ambiguous nature of the UGA codons does not make it any easier to identify and annotate selenoprotein genes using standard gene annotation pipelines. This has lead to many annotation errors in the past, because most gene annotations pipelines still solely rely on using UGA codons to determine the end of open reading frames (ORFs), which in the case of Sec will be completely wrong

  • SelenoDB 2.0 includes a manually curated annotation of human selenoproteins, Cys-containing homologs and genes involved in the metabolism of selenium and Sec derived from the GENCODE annotation, which we contributed to produce [26]

Read more

Summary

INTRODUCTION

Selenoproteins are proteins that contain the amino acid selenocysteine (Sec) as one of their constituent residues. With SelenoDB 1.0 [6] as the first step in this direction, we correctly annotated selenoprotein genes in a small number of species This release of the database has contributed to the study of Sec and selenoproteins in the last few years [7,8,9,10,11,12]. These reports include information about gene and protein names, family and subfamily names, species and its taxonomical classification and the genomic or protein annotation itself Even though this first release of SelenoDB had few annotations, it allowed us to develop a robust relational database implemented in MySQL 5.0.

Manual annotation of human selenoprotein genes
Orthology assignment
SECIS annotation
VARIATION DATA
Exome capture and sequencing
SNP calling
NEW INTERFACE FEATURES
Findings
FUTURE DIRECTIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call