Abstract

BackgroundThe rapidly increasing availability of whole-genome sequences has enabled the study of whole-genome evolution. Evolutionary mechanisms based on genome rearrangements have attracted much attention and given rise to many models; somewhat independently, the mechanisms of gene duplication and loss have seen much work. However, the two are not independent and thus require a unified treatment, which remains missing to date. Moreover, existing rearrangement models do not fit the dichotomy between most prokaryotic genomes (one circular chromosome) and most eukaryotic genomes (multiple linear chromosomes).ResultsTo handle rearrangements, gene duplications and losses, we propose a new evolutionary model and the corresponding method for estimating true evolutionary distance. Our model, inspired from the DCJ model, is simple and the first to respect the prokaryotic/eukaryotic structural dichotomy. Experimental results on a wide variety of genome structures demonstrate the very high accuracy and robustness of our distance estimator.ConclusionWe give the first robust, statistically based, estimate of genomic pairwise distances based on rearrangements, duplications and losses, under a model that respects the structural dichotomy between prokaryotic and eukaryotic genomes. Accurate and robust estimates in true evolutionary distances should translate into much better phylogenetic reconstructions as well as more accurate genomic alignments, while our new model of genome rearrangements provides another refinement in simplicity and verisimilitude.

Highlights

  • The rapidly increasing availability of whole-genome sequences has enabled the study of whole-genome evolution

  • We propose a new evolutionary model which respects the dichotomy between prokaryotic and eukaryotic genomes and which takes gene duplications and losses into account

  • Using this new evolutionary model, we develop a statistically based method to estimate the true evolutionary distance in terms of the actual number of rearrangements, gene duplications, and gene losses

Read more

Summary

Introduction

The rapidly increasing availability of whole-genome sequences has enabled the study of whole-genome evolution. Researchers have used a two-stage process, in which a well defined measure is first computed (such as an edit distance, that is, the smallest number of evolutionary events needed to transform one genome into the other), a statistical model of evolution is used to infer an estimate of the true distance by deriving the effect of a given number of changes in the model on the computed measure and (algebraically or numerically) inverting the derivation to produce a maximum-likelihood estimate of the true distance under the model This second step is usually called a distance correction and has long been used for sequence (DNA) data (see, e.g., [2]) as well as, more recently, for geneorder data, for which see [3,4,5,6,7]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call