Does the choice of nucleotide substitution models matter topologically?

Michael Hoff,Stefan Orf,Benedikt Riehm,Alexandros Stamatakis,Diego Darriba

doi:10.1186/s12859-016-0985-x

Abstract

BackgroundIn the context of a master level programming practical at the computer science department of the Karlsruhe Institute of Technology, we developed and make available an open-source code for testing all 203 possible nucleotide substitution models in the Maximum Likelihood (ML) setting under the common Akaike, corrected Akaike, and Bayesian information criteria. We address the question if model selection matters topologically, that is, if conducting ML inferences under the optimal, instead of a standard General Time Reversible model, yields different tree topologies. We also assess, to which degree models selected and trees inferred under the three standard criteria (AIC, AICc, BIC) differ. Finally, we assess if the definition of the sample size (#sites versus #sites × #taxa) yields different models and, as a consequence, different tree topologies.ResultsWe find that, all three factors (by order of impact: nucleotide model selection, information criterion used, sample size definition) can yield topologically substantially different final tree topologies (topological difference exceeding 10 %) for approximately 5 % of the tree inferences conducted on the 39 empirical datasets used in our study.ConclusionsWe find that, using the best-fit nucleotide substitution model may change the final ML tree topology compared to an inference under a default GTR model. The effect is less pronounced when comparing distinct information criteria. Nonetheless, in some cases we did obtain substantial topological differences.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-0985-x) contains supplementary material, which is available to authorized users.

Highlights

In the context of a master level programming practical at the computer science department of the Karlsruhe Institute of Technology, we developed and make available an open-source code for testing all 203 possible nucleotide substitution models in the Maximum Likelihood (ML) setting under the common Akaike, corrected Akaike, and Bayesian information criteria
We present a set of experiments on empirical datasets to answer the following question: Does model selection really matter with respect to its impact on the shape of the final tree topology? Posada and Buckley discussed the potential impact of the sample size on AICc and BIC criteria [15]
We assess the magnitude of topological differences between trees inferred under GTR+ and trees inferred under the best-fit model according to the respective information criterion

Summary

Introduction

We address the question if model selection matters topologically, that is, if conducting ML inferences under the optimal, instead of a standard General Time Reversible model, yields different tree topologies. To which degree models selected and trees inferred under the three standard criteria (AIC, AICc, BIC) differ. Statistical models of DNA evolution as used in Bayesian inference (BI) and Maximum Likelihood (ML) methods for phylogenetic reconstruction are typically required to be time-reversible. A nucleotide substitution matrix is timereversible, it must exhibit a certain symmetry. This symmetry requirement is depicted in the following example ACGT ⎛ ⎞ A.

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Mar 24, 2016
Citations: 51	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Does the choice of nucleotide substitution models matter topologically?

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Does Choice in Model Selection Affect Maximum Likelihood Analysis?
Jennifer Ripplinger ... Jack Sullivan
Systematic Biology | VOL. 57
Jennifer Ripplinger, et. al.Jennifer Ripplinger ... Jack Sullivan
01 Feb 2008
Systematic Biology | VOL. 57

Phylogenetic Analysis of HIV-1 CRF65_CPX Reveals Yunnan Province Is Still a Source Contributing to the Spread of HIV-1 in China
Yongjian Liu ... Zuoyi Bao
JAIDS Journal of Acquired Immune Deficiency Syndromes | VOL. 70
Yongjian Liu, et. al.Yongjian Liu ... Zuoyi Bao
01 Nov 2015
JAIDS Journal of Acquired Immune Deficiency Syndromes | VOL. 70

Comparison of Akaike information criterion (AIC) and Bayesian information criterion (BIC) in selection of an asymmetric price relationship

Journal of Development and Agricultural Economics | VOL. 2

31 Jan 2010
Journal of Development and Agricultural Economics | VOL. 2

Performance of Akaike Information Criterion and Bayesian Information Criterion in Selecting Partition Models and Mixture Models.
Qin Liu ... Shane A Richards
Systematic Biology | VOL. 72
Qin Liu, et. al.Qin Liu ... Shane A Richards
28 Dec 2022
Systematic Biology | VOL. 72

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Does the choice of nucleotide substitution models matter topologically?

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics