Model Selection for Mixtures of Mutagenetic Trees

Junming Yin,Thomas Lengauer,Niko Beerenwinkel,Jörg Rahnenführer

doi:10.2202/1544-6115.1164

Abstract

The evolution of drug resistance in HIV is characterized by the accumulation of resistance-associated mutations in the HIV genome. Mutagenetic trees, a family of restricted Bayesian tree models, have been applied to infer the order and rate of occurrence of these mutations. Understanding and predicting this evolutionary process is an important prerequisite for the rational design of antiretroviral therapies. In practice, mixtures models of K mutagenetic trees provide more flexibility and are often more appropriate for modelling observed mutational patterns. Here, we investigate the model selection problem for K-mutagenetic trees mixture models. We evaluate several classical model selection criteria including cross-validation, the Bayesian Information Criterion (BIC), and the Akaike Information Criterion. We also use the empirical Bayes method by constructing a prior probability distribution for the parameters of a mutagenetic trees mixture model and deriving the posterior probability of the model. In addition to the model dimension, we consider the redundancy of a mixture model, which is measured by comparing the topologies of trees within a mixture model. Based on the redundancy, we propose a new model selection criterion, which is a modification of the BIC. Experimental results on simulated and on real HIV data show that the classical criteria tend to select models with far too many tree components. Only cross-validation and the modified BIC recover the correct number of trees and the tree topologies most of the time. At the same optimal performance, the runtime of the new BIC modification is about one order of magnitude lower. Thus, this model selection criterion can also be used for large data sets for which cross-validation becomes computationally infeasible.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Model Selection for Mixtures of Mutagenetic Trees

Abstract

Talk to us

Similar Papers

More From: Statistical Applications in Genetics and Molecular Biology

Lead the way for us

Journal: Statistical Applications in Genetics and Molecular Biology	Publication Date: Jan 23, 2006
Citations: 11

Similar Papers

Performance of Akaike Information Criterion and Bayesian Information Criterion in Selecting Partition Models and Mixture Models.
Qin Liu ... Shane A Richards
Systematic Biology | VOL. 72
Qin Liu, et. al.Qin Liu ... Shane A Richards
28 Dec 2022
Systematic Biology | VOL. 72

Comparison of Akaike information criterion (AIC) and Bayesian information criterion (BIC) in selection of an asymmetric price relationship

Journal of Development and Agricultural Economics | VOL. 2

31 Jan 2010
Journal of Development and Agricultural Economics | VOL. 2

Performance of criteria for selecting evolutionary models in phylogenetics: a comprehensive study based on simulated datasets.
Arong Luo ... Aibing Zhang
BMC Evolutionary Biology | VOL. 10
Arong Luo, et. al.Arong Luo ... Aibing Zhang
09 Aug 2010
BMC Evolutionary Biology | VOL. 10

Model selection rates of information based criteria
Ashok Chaurasia ... Ofer Harel
Electronic Journal of Statistics | VOL. 7
Ashok Chaurasia, et. al.Ashok Chaurasia ... Ofer Harel
01 Jan 2013
Electronic Journal of Statistics | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Model Selection for Mixtures of Mutagenetic Trees

Abstract

Talk to us

Similar Papers

More From: Statistical Applications in Genetics and Molecular Biology