No magic pill for phylogenetic error

Joseph W Thornton,Bryan Kolaczkowski

doi:10.1016/j.tig.2005.04.002

Abstract

Phylogenies provide the framework for all inferences incomparative biology, so obtaining the right tree is critical.Maximum parsimony (MP) is a non-parametric methodthat usually performs well, but certain branch-lengthcombinations can create a strong bias – called long-branchattraction – in favor of the wrong tree [1,2]. Maximumlikelihood (ML) will, in most cases, recover the true tree ifthe correct probabilistic model of the evolutionary processis used and enough data are provided. Limited researchhas been conducted on the performance of ML when theincorrect model is used. Simulation studies have shownthat when models that are less complex than the trueprocess are used ML can become subject to long-branchattraction, although the bias is not as strong as with MP[3–5]. In these studies, the true model has always beenavailable in major software packages, so the take-homemessage has been that we can be conﬁdent in ML resultsas long as we select the correct model.Real sequences, however, are subject to selectionpressures that might change over time and vary amongsites. The diverse evolutionary dynamics that result arenot modeled by current ML implementations, whichassume an identically distributed evolutionary processforallsequencesites.Ofparticularconcernisheterotachy–when evolutionary rates at speciﬁc sites differ amonglineages because of changing selective constraints. Hetero-tachy, which has been shown to occur in numerous genes[6–11], is important because accurate estimates of branchlengths for each site are key to recovering the true tree.Our recent study [12], reviewed in this issue by MikeSteele [13], showed that some heterotachous conditionscan cause ML to become strongly biased in favor of theincorrect tree, even when the best available model is used.This occurs because ML estimates branch lengths ascompromises across all sites, which makes them incorrectfor every site when heterotachy is present. When theincorrect branch lengths are used, the likelihood of theincorrect tree can be greater than that of the true tree.Under some of the conditions we examined, ML is sostrongly biased that it is outperformed by MP, which isunaffected by heterotachy. Furthermore, the deﬁcienciesof ML in this case cannot be repaired using a better model,because a model resembling this heterogeneous evolution-ary process has not been implemented.Realism and mixed modelsSteel raises several interesting questions and caveats.First, he argues that the conditions we investigated –convergent rate changes in non-sister lineages – areunrealistic. This might well be true, but the patterns ofheterotachy in real sequence sets have not been exploredadequately to support such a statement empirically. Theideathatasitemightbereleasedfromselection(orsubjectedto novel constraints) in parallel does not seem outlandish,although it might occur rarely. Consider, for example, aprotein whose structure is stabilized by an interactionbetween the side chains on two helices; the speciﬁc sitesinvolved in the interaction might change with time,constraining sites in some lineages that were previouslyneutral, and releasing formerly constrained sites to evolvemore rapidly. If there are a ﬁnite number of sites that canparticipate in this interaction, a site might become part ofthe interaction in two separate lineages independently.Moreimportantly,severalotherformsofheterotachy,whichareprobablymorerealistic,alsocauseMLtoperformpoorly–includingsequencesinwhichevolutionaryconstraintsarereleased in a heterotachous manner in single lineages, andsequences that mix some sites containing a strong signalwith others containing pure noise (our unpublished data).Second, Steel implies that we are too pessimistic in ourdiscussion of the potential that mixed models offer forimproving ML. In our article, we developed an ML modelto accommodate heterotachy, in which each site can evolveon a mixture of two different sets of branch lengths; weshowed that this technique performs much better thanstandardMLorparsimony.Weareexcitedbythepotentialof this approach and are actively pursuing it. Models likethis have not yet been implemented in a generally usefulframework, however, and their accuracy and robustnessunder a wide range of conditions have not yet beenvalidated. Indeed, there are non-trivial computationalissues that limit the ability of current algorithms to ﬁndthe optimal parameter values for mixture models; thesewill have to be solved before the method can be used onanything larger than a toy problem. Furthermore, themethod is much more computationally intensive thanstandard ML, which might render it impractical for thelarge data sets that are usually required for phylogeneticaccuracy. We therefore feel it is appropriate to temper ouroptimism about this new strategy with caution.Selecting a good modelAn additional concern is model selection: how does oneknow how many categories should be used in a mixedmodel? If models are signiﬁcantly underparameterized,the same errors that occur with homogeneous ML arelikely to be reproduced. If many categories are necessary –which can be a result of most sites having uniqueevolutionary dynamics – then the number of parametersapproaches the number of sites, and the data will be

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

No magic pill for phylogenetic error

Abstract

Talk to us

Similar Papers

More From: Trends in Genetics

Lead the way for us

Journal: Trends in Genetics	Publication Date: Apr 11, 2005
Citations: 33

Similar Papers

Long-branch attraction bias and inconsistency in Bayesian phylogenetics.
Bryan Kolaczkowski ... Joseph W Thornton
PLoS ONE | VOL. 4
Bryan Kolaczkowski, et. al.Bryan Kolaczkowski ... Joseph W Thornton
09 Dec 2009
PLoS ONE | VOL. 4

Threshold estimation in the log-gamma model
R.C.H Cheng
Journal of Statistical Planning and Inference | VOL. 119
R.C.H ChengR.C.H Cheng
07 Dec 2002
Journal of Statistical Planning and Inference | VOL. 119

Heterotachy and long-branch attraction in phylogenetics.
Hervé Philippe ... Frédéric Delsuc
BMC Evolutionary Biology | VOL. 5
Hervé Philippe, et. al.Hervé Philippe ... Frédéric Delsuc
06 Oct 2005
BMC Evolutionary Biology | VOL. 5

Estimating amino acid substitution models from genome datasets: a simulation study on the performance of estimated models.
Nguyen Huy Tinh ... Le Sy Vinh
Journal of Evolutionary Biology | VOL. 37
Nguyen Huy Tinh, et. al.Nguyen Huy Tinh ... Le Sy Vinh
12 Dec 2023
Journal of Evolutionary Biology | VOL. 37

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

No magic pill for phylogenetic error

Abstract

Talk to us

Similar Papers

More From: Trends in Genetics