Abstract

BackgroundOrtholog assignment is a critical and fundamental problem in comparative genomics, since orthologs are considered to be functional counterparts in different species and can be used to infer molecular functions of one species from those of other species. MSOAR is a recently developed high-throughput system for assigning one-to-one orthologs between closely related species on a genome scale. It attempts to reconstruct the evolutionary history of input genomes in terms of genome rearrangement and gene duplication events. It assumes that a gene duplication event inserts a duplicated gene into the genome of interest at a random location (i.e., the random duplication model). However, in practice, biologists believe that genes are often duplicated by tandem duplications, where a duplicated gene is located next to the original copy (i.e., the tandem duplication model).ResultsIn this paper, we develop MSOAR 2.0, an improved system for one-to-one ortholog assignment. For a pair of input genomes, the system first focuses on the tandemly duplicated genes of each genome and tries to identify among them those that were duplicated after the speciation (i.e., the so-called inparalogs), using a simple phylogenetic tree reconciliation method. For each such set of tandemly duplicated inparalogs, all but one gene will be deleted from the concerned genome (because they cannot possibly appear in any one-to-one ortholog pairs), and MSOAR is invoked. Using both simulated and real data experiments, we show that MSOAR 2.0 is able to achieve a better sensitivity and specificity than MSOAR. In comparison with the well-known genome-scale ortholog assignment tool InParanoid, Ensembl ortholog database, and the orthology information extracted from the well-known whole-genome multiple alignment program MultiZ, MSOAR 2.0 shows the highest sensitivity. Although the specificity of MSOAR 2.0 is slightly worse than that of InParanoid in the real data experiments, it is actually better than that of InParanoid in the simulation tests.ConclusionsOur preliminary experimental results demonstrate that MSOAR 2.0 is a highly accurate tool for one-to-one ortholog assignment between closely related genomes. The software is available to the public for free and included as online supplementary material.

Highlights

  • Ortholog assignment is a critical and fundamental problem in comparative genomics, since orthologs are considered to be functional counterparts in different species and can be used to infer molecular functions of one species from those of other species

  • From the outputs of the three programs, we can compare their prediction accuracies in terms of sensitivity and specificity

  • As for specificity, since MSOAR 2.0 removes most of the inparalogs in TAGs based on the phylogenetic analysis and only the main ortholog pairs found by InParanoid are considered, the two programs do not introduce more false positives (FPs) when the number of tandem duplications increases

Read more

Summary

Introduction

Ortholog assignment is a critical and fundamental problem in comparative genomics, since orthologs are considered to be functional counterparts in different species and can be used to infer molecular functions of one species from those of other species. MSOAR is a recently developed high-throughput system for assigning one-to-one orthologs between closely related species on a genome scale It attempts to reconstruct the evolutionary history of input genomes in terms of genome rearrangement and gene duplication events. For each ortholog group, we may select a representative from each set of inparalogs (e.g., the exemplar gene) and define a one-to-one ortholog pair consisting of the two representatives. Such an ortholog pair may contain the two genes, one from each set, that correspond the best in terms of their positions on the genomes [4] or sequence similarity [2]. This allows us to think of orthology as a one-to-one relationship, which could help simplify the discussion in many cases and makes it possible to evaluate an ortholog assignment result against the orthology benchmark defined by gene symbols (which is a one-to-one relationship)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call