Abstract

eggNOG is a public database of orthology relationships, gene evolutionary histories and functional annotations. Here, we present version 5.0, featuring a major update of the underlying genome sets, which have been expanded to 4445 representative bacteria and 168 archaea derived from 25 038 genomes, as well as 477 eukaryotic organisms and 2502 viral proteomes that were selected for diversity and filtered by genome quality. In total, 4.4M orthologous groups (OGs) distributed across 379 taxonomic levels were computed together with their associated sequence alignments, phylogenies, HMM models and functional descriptors. Precomputed evolutionary analysis provides fine-grained resolution of duplication/speciation events within each OG. Our benchmarks show that, despite doubling the amount of genomes, the quality of orthology assignments and functional annotations (80% coverage) has persisted without significant changes across this update. Finally, we improved eggNOG online services for fast functional annotation and orthology prediction of custom genomics or metagenomics datasets. All precomputed data are publicly available for downloading or via API queries at http://eggnog.embl.de

Highlights

  • Identifying orthologs, those sequences diverging from a common ancestry after a speciation event, constitutes a fundamental task in molecular and evolutionary biology

  • EggNOG is a public database of orthology relationships, gene evolutionary histories and functional annotations

  • Based on recent benchmarks [29], we adapted our phylogenomic strategy to the following steps: multiple sequence alignments inferred with Clustal Omega [30], soft alignment trimming by removing columns with less than five aligned residues, model testing using ModelFinder [31], maximum likelihood trees computed with IqTree [32] and branch supports calculated using the ultrafast bootstrap method [33]

Read more

Summary

INTRODUCTION

Identifying orthologs, those sequences diverging from a common ancestry after a speciation event, constitutes a fundamental task in molecular and evolutionary biology. Compared to paralogs, which are sequences diverged after a duplication event, orthologs are more prone to retain their ancestral function [1,2], even at long evolutionary timescales [3]. Differentiating between these two subtypes of homology relationships is crucial to produce accurate functional predictions [2,4,5]. EggNOG focuses on providing: (i) comprehensive functional annotations for the inferred orthologs, (ii) predictions across thousands of genomes covering the three domains of life and viruses, and iii) hierarchical resolution of orthology assignments and fine-grained relationships (i.e. in-paralogies) based on phylogenetic analysis. We describe eggNOG v5.0, including the following improvements over previous versions: (i) a major upgrade of the underlying databases, featuring one of the most comprehensive selection of prokaryotic, eukaryotic and viral genomes available; (ii) updates in the online service for custom (meta-)genome annotation, including options for fast orthology prediction and improved computational power via cloud computing and (iii) better visualization options of OGs and their associated functional data

UPDATES AND ADDITIONS SINCE PREVIOUS RELEASE
Hierarchical consistency of OGs
Phylogenetics analysis
Functional annotations
Findings
CONCLUSIONS AND PERSPECTIVES
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call