Abstract

The identification of orthologous genes shared by multiple genomes plays an important role in evolutionary studies and gene functional analyses. Based on a recently developed accurate tool, called MSOAR 2.0, for ortholog assignment between a pair of closely related genomes based on genome rearrangement, we present a new system MultiMSOAR 2.0, to identify ortholog groups among multiple genomes in this paper. In the system, we construct gene families for all the genomes using sequence similarity search and clustering, run MSOAR 2.0 for all pairs of genomes to obtain the pairwise orthology relationship, and partition each gene family into a set of disjoint sets of orthologous genes (called super ortholog groups or SOGs) such that each SOG contains at most one gene from each genome. For each such SOG, we label the leaves of the species tree using 1 or 0 to indicate if the SOG contains a gene from the corresponding species or not. The resulting tree is called a tree of ortholog groups (or TOGs). We then label the internal nodes of each TOG based on the parsimony principle and some biological constraints. Ortholog groups are finally identified from each fully labeled TOG. In comparison with a popular tool MultiParanoid on simulated data, MultiMSOAR 2.0 shows significantly higher prediction accuracy. It also outperforms MultiParanoid, the Roundup multi-ortholog repository and the Ensembl ortholog database in real data experiments using gene symbols as a validation tool. In addition to ortholog group identification, MultiMSOAR 2.0 also provides information about gene births, duplications and losses in evolution, which may be of independent biological interest. Our experiments on simulated data demonstrate that MultiMSOAR 2.0 is able to infer these evolutionary events much more accurately than a well-known software tool Notung. The software MultiMSOAR 2.0 is available to the public for free.

Highlights

  • The ever-increasing number of completely sequenced genomes brings great opportunities as well as challenges to the study of comparative genomics

  • In order to test the performance of our system MultiMSOAR 2.0, we first apply it to simulated data, and compare it with the popular ortholog assignment tool MultiParanoid [27] for multiple genomes

  • We only consider the common and different ortholog groups constructed by MultiMSOAR 2.0, MultiParanoid, Roundup and the Ensembl ortholog database

Read more

Summary

Introduction

The ever-increasing number of completely sequenced genomes brings great opportunities as well as challenges to the study of comparative genomics. It makes the study of the evolutionary history of closely related species at the genome level possible. It enhances our ability to perform gene functional analyses across different species. Definitions Orthologous genes (i.e., orthologs) are genes in different genomes that evolved from a common ancestral gene through speciation events [1]. They are more likely to preserve the original gene function. Orthologs are often used as universal and unique landmarks within each genome as well as links across different genomes [2]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.