Exploring the Relationship between Sequence Similarity and Accurate Phylogenetic Trees

B L Cantarel

doi:10.1093/molbev/msl080

Abstract

We have characterized the relationship between accurate phylogenetic reconstruction and sequence similarity, testing whether high levels of sequence similarity can consistently produce accurate evolutionary trees. We generated protein families with known phylogenies using a modified version of the PAML/EVOLVER program that produces insertions and deletions as well as substitutions. Protein families were evolved over a range of 100-400 point accepted mutations; at these distances 63% of the families shared significant sequence similarity. Protein families were evolved using balanced and unbalanced trees, with ancient or recent radiations. In families sharing statistically significant similarity, about 60% of multiple sequence alignments were 95% identical to true alignments. To compare recovered topologies with true topologies, we used a score that reflects the fraction of clades that were correctly clustered. As expected, the accuracy of the phylogenies was greatest in the least divergent families. About 88% of phylogenies clustered over 80% of clades in families that shared significant sequence similarity, using Bayesian, parsimony, distance, and maximum likelihood methods. However, for protein families with short ancient branches (ancient radiation), only 30% of the most divergent (but statistically significant) families produced accurate phylogenies, and only about 70% of the second most highly conserved families, with median expectation values better than 10(-60), produced accurate trees. These values represent upper bounds on expected tree accuracy for sequences with a simple divergence history; proteins from 700 Giardia families, with a similar range of sequence similarities but considerably more gaps, produced much less accurate trees. For our simulated insertions and deletions, correct multiple sequence alignments did not perform much better than those produced by T-COFFEE, and including sequences with expressed sequence tag-like sequencing errors did not significantly decrease phylogenetic accuracy. In general, although less-divergent sequence families produce more accurate trees, the likelihood of estimating an accurate tree is most dependent on whether radiation in the family was ancient or recent. Accuracy can be improved by combining genes from the same organism when creating species trees or by selecting protein families with the best bootstrap values in comprehensive studies.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Molecular Biology and Evolution	Publication Date: Aug 10, 2006
Citations: 32	License type: cc-by-nc

R Discovery Prime

R Discovery Prime

Exploring the Relationship between Sequence Similarity and Accurate Phylogenetic Trees

Abstract

Talk to us

Similar Papers

More From: Molecular Biology and Evolution

Lead the way for us

Similar Papers

Phylogenetic Noise Leads to Unbalanced Cladistic Tree Reconstructions
Arne Ø Mooers ... Roderic D M Page
Systematic Biology | VOL. 44
Arne Ø Mooers, et. al.Arne Ø Mooers ... Roderic D M Page
01 Sep 1995
Systematic Biology | VOL. 44

Evaluating the Relationship between Evolutionary Divergence and Phylogenetic Accuracy in AFLP Data Sets
María Jesús García-Pereira ... Humberto Quesada
Molecular Biology and Evolution | VOL. 27
María Jesús García-Pereira, et. al.María Jesús García-Pereira ... Humberto Quesada
21 Dec 2009
Molecular Biology and Evolution | VOL. 27

SATé-II: Very Fast and Accurate Simultaneous Estimation of Multiple Sequence Alignments and Phylogenetic Trees
Kevin Liu ... C Randal Linder
Systematic Biology | VOL. 61
Kevin Liu, et. al.Kevin Liu ... C Randal Linder
01 Dec 2011
Systematic Biology | VOL. 61

DPRml: distributed phylogeny reconstruction by maximum likelihood
T M Keane ... G P Mccormack
Bioinformatics | VOL. 21
T M Keane, et. al.T M Keane ... G P Mccormack
28 Oct 2004
Bioinformatics | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploring the Relationship between Sequence Similarity and Accurate Phylogenetic Trees

Abstract

Talk to us

Similar Papers

More From: Molecular Biology and Evolution