Abstract
BackgroundAn orthologous group (OG) comprises a set of orthologous and paralogous genes that share a last common ancestor (LCA). OGs are defined with respect to a chosen taxonomic level, which delimits the position of the LCA in time to a specified speciation event. A hierarchy of OGs expands on this notion, connecting more general OGs, distant in time, to more recent, fine-grained OGs, thereby spanning multiple levels of the tree of life. Large scale inference of OG hierarchies with independently computed taxonomic levels can suffer from inconsistencies between successive levels, such as the position in time of a duplication event. This can be due to confounding genetic signal or algorithmic limitations. Importantly, inconsistencies limit the potential use of OGs for functional annotation and third-party applications.ResultsHere we present a new methodology to ensure hierarchical consistency of OGs across taxonomic levels. To resolve an inconsistency, we subsample the protein space of the OG members and perform gene tree-species tree reconciliation for each sampling. Differently from previous approaches, by subsampling the protein space, we avoid the notoriously difficult task of accurately building and reconciling very large phylogenies. We implement the method into a high-throughput pipeline and apply it to the eggNOG database. We use independent protein domain definitions to validate its performance.ConclusionThe presented consistency pipeline shows that, contrary to previous limitations, tree reconciliation can be a useful instrument for the construction of OG hierarchies. The key lies in the combination of sampling smaller trees and aggregating their reconciliations for robustness. Results show comparable or greater performance to previous pipelines. The code is available on Github at: https://github.com/meringlab/og_consistency_pipeline.
Highlights
An orthologous group (OG) comprises a set of orthologous and paralogous genes that share a last common ancestor (LCA)
From the initial definition of orthology and paralogy by Walter Fitch [1], which distinguishes whether two genes diverged from their last common ancestor by speciation or duplication, the concept has been expanded to the notion of orthologous group (OG) [2]
The latter aims to represent a set of genes from two or more species that are in a homologous relationship with respect to their last common ancestor at a given speciation event
Summary
An orthologous group (OG) comprises a set of orthologous and paralogous genes that share a last common ancestor (LCA). From the initial definition of orthology and paralogy by Walter Fitch [1], which distinguishes whether two genes diverged from their last common ancestor by speciation or duplication, the concept has been expanded to the notion of orthologous group (OG) [2] The latter aims to represent a set of genes from two or more species that are in a homologous relationship with respect to their last common ancestor at a given speciation event. When defining OGs, one always chooses a taxonomic level of reference, i.e. the last common ancestor of the species included in the OG Because of this characteristic, many resources have focused their attention on providing hierarchically layered OGs [4,5,6,7,8] or "OG hierarchies" illustrated in (Fig. 1). EggNOG [9] and orthoDB [10] compute OGs independently at various radiations on the tree of life, while others rely on a graph based approach [11] or hierarchical pairwise comparison [8]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.