Orthology Prediction Research Articles

Orthology analysis aims at identifying orthologous genes and gene products from different organisms and, therefore, is a powerful tool in modern computational and experimental biology. Although reconciliation-based orthology methods are generally considered more accurate than distance-based ones, the traditional parsimony-based implementation of reconciliation-based orthology analysis (most parsimonious reconciliation [MPR]) suffers from a number of shortcomings. For example, 1) it is limited to orthology predictions from the reconciliation that minimizes the number of gene duplication and loss events, 2) it cannot evaluate the support of this reconciliation in relation to the other reconciliations, and 3) it cannot make use of prior knowledge (e.g., about species divergence times) that provides auxiliary information for orthology predictions. We present a probabilistic approach to reconciliation-based orthology analysis that addresses all these issues by estimating orthology probabilities. The method is based on the gene evolution model, an explicit evolutionary model for gene duplication and gene loss inside a species tree, that generalizes the standard birth-death process. We describe the probabilistic approach to orthology analysis using 2 experimental data sets and show that the use of orthology probabilities allows a more informative analysis than MPR and, in particular, that it is less sensitive to taxon sampling problems. We generalize these anecdotal observations and show, using data generated under biologically realistic conditions, that MPR give false orthology predictions at a substantial frequency. Last, we provide a new orthology prediction method that allows an orthology and paralogy classification with any chosen sensitivity/specificity combination from the spectra of achievable combinations. We conclude that probabilistic orthology analysis is a strong and more advanced alternative to traditional orthology analysis and that it provides a framework for sophisticated comparative studies of processes in genome evolution.

Read full abstract

BackgroundPhylogenetic studies using expressed sequence tags (EST) are becoming a standard approach to answer evolutionary questions. Such studies are usually based on large sets of newly generated, unannotated, and error-prone EST sequences from different species. A first crucial step in EST-based phylogeny reconstruction is to identify groups of orthologous sequences. From these data sets, appropriate target genes are selected, and redundant sequences are eliminated to obtain suitable sequence sets as input data for tree-reconstruction software. Generating such data sets manually can be very time consuming. Thus, software tools are needed that carry out these steps automatically.ResultsWe developed a flexible and user-friendly software pipeline, running on desktop machines or computer clusters, that constructs data sets for phylogenomic analyses. It automatically searches assembled EST sequences against databases of orthologous groups (OG), assigns ESTs to these predefined OGs, translates the sequences into proteins, eliminates redundant sequences assigned to the same OG, creates multiple sequence alignments of identified orthologous sequences and offers the possibility to further process this alignment in a last step by excluding potentially homoplastic sites and selecting sufficiently conserved parts. Our software pipeline can be used as it is, but it can also be adapted by integrating additional external programs. This makes the pipeline useful for non-bioinformaticians as well as to bioinformatic experts. The software pipeline is especially designed for ESTs, but it can also handle protein sequences.ConclusionOrthoSelect is a tool that produces orthologous gene alignments from assembled ESTs. Our tests show that OrthoSelect detects orthologs in EST libraries with high accuracy. In the absence of a gold standard for orthology prediction, we compared predictions by OrthoSelect to a manually created and published phylogenomic data set. Our tool was not only able to rebuild the data set with a specificity of 98%, but it detected four percent more orthologous sequences. Furthermore, the results OrthoSelect produces are in absolut agreement with the results of other programs, but our tool offers a significant speedup and additional functionality, e.g. handling of ESTs, computing sequence alignments, and refining them. To our knowledge, there is currently no fully automated and freely available tool for this purpose. Thus, OrthoSelect is a valuable tool for researchers in the field of phylogenomics who deal with large quantities of EST sequences. OrthoSelect is written in Perl and runs on Linux/Mac OS X. The tool can be downloaded at

Read full abstract

Orthology Prediction Research Articles

Related Topics

Articles published on Orthology Prediction

Ortholog identification in the presence of domain architecture rearrangement

Positional orthology: putting genomic evolutionary relationships into context

Letter to the Editor: SeqXML and OrthoXML: standards for sequence and orthology information

OrthoList: a compendium of C. elegans genes with human orthologs.

QuartetS: a fast and accurate algorithm for large-scale orthology detection

Evaluating ortholog prediction algorithms in a yeast model clade.

IsoBase: a database of functionally related proteins across PPI networks

MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score

PhylomeDB v3.0: an expanding repository of genome-wide collections of trees, alignments and phylogeny-based orthology and paralogy predictions

A statistical approach to high-throughput screening of predicted orthologs

RegPredict: an integrated system for regulon inference in prokaryotes by comparative genomics approach

A Phylogenomic Approach to Resolve the Arthropod Tree of Life

Innate Immune Signaling Pathways in Reactome (94.13)

ETE: a python Environment for Tree Exploration

A New Approach to Find Orthologous Proteins Using Sequence and Protein-Protein Interaction Similarity

Benchmarking Next-Generation Transcriptome Sequencing for Functional and Evolutionary Genomics

Probabilistic Orthology Analysis

OrthoSelect: a protocol for selecting orthologous groups in phylogenomics

Reactome - a knowledgebase of human biological pathways

The Tree versus the Forest: The Fungal Tree of Life and the Topological Diversity within the Yeast Phylome

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Orthology Prediction Research Articles

Related Topics

Articles published on Orthology Prediction

Ortholog identification in the presence of domain architecture rearrangement

Positional orthology: putting genomic evolutionary relationships into context

Letter to the Editor: SeqXML and OrthoXML: standards for sequence and orthology information

OrthoList: a compendium of C. elegans genes with human orthologs.

QuartetS: a fast and accurate algorithm for large-scale orthology detection

Evaluating ortholog prediction algorithms in a yeast model clade.

IsoBase: a database of functionally related proteins across PPI networks

MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score

PhylomeDB v3.0: an expanding repository of genome-wide collections of trees, alignments and phylogeny-based orthology and paralogy predictions

A statistical approach to high-throughput screening of predicted orthologs

RegPredict: an integrated system for regulon inference in prokaryotes by comparative genomics approach

A Phylogenomic Approach to Resolve the Arthropod Tree of Life

Innate Immune Signaling Pathways in Reactome (94.13)

ETE: a python Environment for Tree Exploration

A New Approach to Find Orthologous Proteins Using Sequence and Protein-Protein Interaction Similarity

Benchmarking Next-Generation Transcriptome Sequencing for Functional and Evolutionary Genomics

Probabilistic Orthology Analysis

OrthoSelect: a protocol for selecting orthologous groups in phylogenomics

Reactome - a knowledgebase of human biological pathways

The Tree versus the Forest: The Fungal Tree of Life and the Topological Diversity within the Yeast Phylome