Abstract

BackgroundMany of the commonly used methods for orthology detection start from mutually most similar pairs of genes (reciprocal best hits) as an approximation for evolutionary most closely related pairs of genes (reciprocal best matches). This approximation of best matches by best hits becomes exact for ultrametric dissimilarities, i.e., under the Molecular Clock Hypothesis. It fails, however, whenever there are large lineage specific rate variations among paralogous genes. In practice, this introduces a high level of noise into the input data for best-hit-based orthology detection methods.ResultsIf additive distances between genes are known, then evolutionary most closely related pairs can be identified by considering certain quartets of genes provided that in each quartet the outgroup relative to the remaining three genes is known. A priori knowledge of underlying species phylogeny greatly facilitates the identification of the required outgroup. Although the workflow remains a heuristic since the correct outgroup cannot be determined reliably in all cases, simulations with lineage specific biases and rate asymmetries show that nearly perfect results can be achieved. In a realistic setting, where distances data have to be estimated from sequence data and hence are noisy, it is still possible to obtain highly accurate sets of best matches.ConclusionImprovements of tree-free orthology assessment methods can be expected from a combination of the accurate inference of best matches reported here and recent mathematical advances in the understanding of (reciprocal) best match graphs and orthology relations.AvailabilityAccompanying software is available at https://github.com/david-schaller/AsymmeTree.

Highlights

  • Many of the commonly used methods for orthology detection start from mutually most similar pairs of genes as an approximation for evolutionary most closely related pairs of genes

  • As defined by Walter Fitch [1, 2], two genes are orthologs if their last common ancestor corresponds to a speciation event, and they are paralogs if they arose through

  • Paralogous members of a gene family often differ in their evolutionary rates due to changes in the function [13, 14]. Both the “Duplication-Degeneration-Complementation” (DDC) model [15] and the “Escape from Adaptive Conflict” (EAC) model [16] predict that the fate of paralogs, including their evolutionary rate, may differ substantially between lineages that diverge soon after the duplication event due to different selective pressures

Read more

Summary

Introduction

Many of the commonly used methods for orthology detection start from mutually most similar pairs of genes (reciprocal best hits) as an approximation for evolutionary most closely related pairs of genes (reciprocal best matches). This approximation of best matches by best hits becomes exact for ultrametric dissimilarities, i.e., under the Molecular Clock Hypothesis. It fails, whenever there are large lineage specific rate variations among paralogous genes. While in fugu (Takifugu rubripes) and other percomorphs the HOXAb paralogs diverge faster, it is the HOXA13b paralog that evolves at a faster rate in zebrafish (Danio rerio), which diverged early from percomorphs within the Teleostei clade

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call