Covariance of maximum likelihood evolutionary distances between sequences aligned pairwise

Christophe Dessimoz,Manuel Gil

doi:10.1186/1471-2148-8-179

Christophe Dessimoz, Manuel Gil

Open Access

https://doi.org/10.1186/1471-2148-8-179

Copy DOI

Abstract

BackgroundThe estimation of a distance between two biological sequences is a fundamental process in molecular evolution. It is usually performed by maximum likelihood (ML) on characters aligned either pairwise or jointly in a multiple sequence alignment (MSA). Estimators for the covariance of pairs from an MSA are known, but we are not aware of any solution for cases of pairs aligned independently. In large-scale analyses, it may be too costly to compute MSAs every time distances must be compared, and therefore a covariance estimator for distances estimated from pairs aligned independently is desirable. Knowledge of covariances improves any process that compares or combines distances, such as in generalized least-squares phylogenetic tree building, orthology inference, or lateral gene transfer detection.ResultsIn this paper, we introduce an estimator for the covariance of distances from sequences aligned pairwise. Its performance is analyzed through extensive Monte Carlo simulations, and compared to the well-known variance estimator of ML distances. Our covariance estimator can be used together with the ML variance estimator to form covariance matrices.ConclusionThe estimator performs similarly to the ML variance estimator. In particular, it shows no sign of bias when sequence divergence is below 150 PAM units (i.e. above ~29% expected sequence identity). Above that distance, the covariances tend to be underestimated, but then ML variances are also underestimated.

Highlights

The estimation of a distance between two biological sequences is a fundamental process in molecular evolution
The most accurate matching of homologous characters is obtained by multiple sequence alignments (MSAs)
We present an estimator for the covariance of maximum likelihood (ML) distances estimated from optimal pairwise alignments (OPAs) that works on triplets and quartets of sequences

Summary

Introduction

The estimation of a distance between two biological sequences is a fundamental process in molecular evolution. The estimation of evolutionary distances between gene/ protein sequences is one of the most important problems in molecular evolution It lies at the heart of most phylogenetic tree construction methods. The sequences can be analyzed exclusively on the basis of pairs of sequences, using an algorithm such as Smith-Waterman [1] that yields optimal pairwise alignments (OPAs). This approach is often taken by large-scale comparative genomics analysis such as MIPS, OMA or RoundUp [2,3,4],

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Evolutionary Biology	Publication Date: Jan 1, 2008
Citations: 16	License type: cc-by

R Discovery Prime

R Discovery Prime

Covariance of maximum likelihood evolutionary distances between sequences aligned pairwise

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Evolutionary Biology

Lead the way for us

Similar Papers

A Parallel Algorithm for Multiple Biological Sequence Alignment
Irma R Andalon-Garcia ... M E Meda-Campaña
-
Irma R Andalon-Garcia, et. al.Irma R Andalon-Garcia ... M E Meda-Campaña
01 Jan 2012
01 Jan 2012

MSACompro: protein multiple sequence alignment using predicted secondary structure, solvent accessibility, and residue-residue contacts
Xin Deng ... Jianlin Cheng
BMC Bioinformatics | VOL. 12
Xin Deng, et. al.Xin Deng ... Jianlin Cheng
01 Dec 2011
BMC Bioinformatics | VOL. 12

Heuristic Methods for Finding Pathogenic Variants in Gene Coding Sequences
Monique Ohanian ... Diane Fatkin
Journal of the American Heart Association | VOL. 1
Monique Ohanian, et. al.Monique Ohanian ... Diane Fatkin
26 Sep 2012
Journal of the American Heart Association | VOL. 1

MARS: improving multiple circular sequence alignment using refined sequences
Lorraine A K Ayad ... Solon P Pissis
BMC Genomics | VOL. 18
Lorraine A K Ayad, et. al.Lorraine A K Ayad ... Solon P Pissis
14 Jan 2017
BMC Genomics | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Covariance of maximum likelihood evolutionary distances between sequences aligned pairwise

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Evolutionary Biology