Abstract

BackgroundExplicit evolutionary models are required in maximum-likelihood and Bayesian inference, the two methods that are overwhelmingly used in phylogenetic studies of DNA sequence data. Appropriate selection of nucleotide substitution models is important because the use of incorrect models can mislead phylogenetic inference. To better understand the performance of different model-selection criteria, we used 33,600 simulated data sets to analyse the accuracy, precision, dissimilarity, and biases of the hierarchical likelihood-ratio test, Akaike information criterion, Bayesian information criterion, and decision theory.ResultsWe demonstrate that the Bayesian information criterion and decision theory are the most appropriate model-selection criteria because of their high accuracy and precision. Our results also indicate that in some situations different models are selected by different criteria for the same dataset. Such dissimilarity was the highest between the hierarchical likelihood-ratio test and Akaike information criterion, and lowest between the Bayesian information criterion and decision theory. The hierarchical likelihood-ratio test performed poorly when the true model included a proportion of invariable sites, while the Bayesian information criterion and decision theory generally exhibited similar performance to each other.ConclusionsOur results indicate that the Bayesian information criterion and decision theory should be preferred for model selection. Together with model-adequacy tests, accurate model selection will serve to improve the reliability of phylogenetic inference and related analyses.

Highlights

  • Explicit evolutionary models are required in maximum-likelihood and Bayesian inference, the two methods that are overwhelmingly used in phylogenetic studies of DNA sequence data

  • The ANOVA-LSD tests demonstrated that there were no significant differences for the pairs of hierarchical likelihoodratio test (hLRT)-Akaike information criterion (AIC) and Bayesian information criterion (BIC)-decision theory (DT) respectively; very significant differences existed for the other pairs such as hLRT-BIC (P < 0.01)

  • The accuracy of the BIC and DT differed among simulations

Read more

Summary

Introduction

Explicit evolutionary models are required in maximum-likelihood and Bayesian inference, the two methods that are overwhelmingly used in phylogenetic studies of DNA sequence data. Among the rigorous methods of tree reconstruction that are available, maximum likelihood (ML) and Bayesian inference (BI) have dominated phylogenetic studies in recent years [1,2,3,4,5,6] Both methods are based on the likelihood function, which needs explicit models of evolution to capture the underlying evolutionary processes in sequence data [1,2,3,4,5,6]. In the last few years, many improvements have been explored, including models that account for differences among the three codon positions [10,11], pattern heterogeneity of the substitution process (e.g., [12]), among-site heterogeneity of rates (e.g., [13]), compositional heterogeneity among lineages (e.g., [14]), and site-specific rate variation through time (e.g., [15,16])

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call