Abstract

Orthology, refining the concept of homology, is the cornerstone of evolutionary comparative studies. With the ever-increasing availability of genomic data, inference of orthology has become instrumental for generating hypotheses about gene functions crucial to many studies. This update of the OrthoDB hierarchical catalog of orthologs (http://www.orthodb.org) covers 3027 complete genomes, including the most comprehensive set of 87 arthropods, 61 vertebrates, 227 fungi and 2627 bacteria (sampling the most complete and representative genomes from over 11,000 available). In addition to the most extensive integration of functional annotations from UniProt, InterPro, GO, OMIM, model organism phenotypes and COG functional categories, OrthoDB uniquely provides evolutionary annotations including rates of ortholog sequence divergence, copy-number profiles, sibling groups and gene architectures. We re-designed the entirety of the OrthoDB website from the underlying technology to the user interface, enabling the user to specify species of interest and to select the relevant orthology level by the NCBI taxonomy. The text searches allow use of complex logic with various identifiers of genes, proteins, domains, ontologies or annotation keywords and phrases. Gene copy-number profiles can also be queried. This release comes with the freely available underlying ortholog clustering pipeline (http://www.orthodb.org/software).

Highlights

  • Orthology is the cornerstone of comparative genomics and gene function prediction

  • With the ever-increasing availability of genomic data, inference of orthology has become instrumental for generating hypotheses about gene functions crucial to many studies

  • We re-designed the entirety of the OrthoDB website from the underlying technology to the user interface, enabling the user to specify species of interest and to select the relevant orthology level by the NCBI taxonomy

Read more

Summary

INTRODUCTION

Orthology is the cornerstone of comparative genomics and gene function prediction. The availability of gene sequence data from a large variety of species is growing quickly, and the gap between such sequence data and the experimental functional data is widening. We retrieved over 11,000 bacterial genomes from Ensembl Bacteria (Release 22, May 2014), and selected 2627 with the most complete annotations and the best sampling of the genetic diversity using a set of universal single-copy genes and our BUSCOs pipeline (Simao et al, submitted). With this update we provide the suite of programs for delineation of orthologous genes that was developed for, and is the basis of, the OrthoDB hierarchical catalog of orthologs. The comparison of alternative clustering (grouping) can be considered in a few respects: (i) as the number of special cases of grouplevel fusions and splits considering only matches with high precision that are less undesirable than complex matches or Nucleic Acids Research, 2015, Vol 43, Database issue D253

Method
Findings
CONCLUSIONS AND PERSPECTIVES
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call