Abstract

Bayesian variable selection is an important method for discovering variables which are most useful for explaining the variation in a response. The widespread use of this method has been restricted by the challenging computational problem of sampling from the corresponding posterior distribution. Recently, the use of adaptive Monte Carlo methods has been shown to lead to performance improvement over traditionally used algorithms in linear regression models. This paper looks at applying one of these algorithms (the adaptively scaled independence sampler) to logistic regression and accelerated failure time models. We investigate the use of this algorithm with data augmentation, Laplace approximation and the correlated pseudo-marginal method. The performance of the algorithms is compared on several genomic data sets.

Highlights

  • The availability of large-scale data sets has led to interest in variable selection for regression models with large number of regressors

  • As discussed by García-Donato and MartínezBeneito (2013), these issues can be addressed by sampling from the posterior distribution using Markov chain Monte Carlo algorithms which provide unbiased estimates of quantities of interest such as the posterior inclusion probability (PIP) for the jth variable, which is the marginal posterior probability that the jth variable is included in the model, or Bayesian model-averaged predictions

  • Sha et al (2006) initially demonstrated how MCMC with data augmentation could be used for Bayesian variable selection when the accelerated failure time (AFT) model has a normal or t distribution

Read more

Summary

Introduction

The availability of large-scale data sets has led to interest in variable selection for regression models with large number of regressors. These variable selection problems are called “large p, small n” variable selection problems. London, UK potential regressors and the parameters of each of these models (the regression coefficients and other parameters such as dispersion parameters) This defines a posterior distribution on the parameters of the model and the models which can be used to investigate the importance of different variables or to make predictions for future observations. The first challenge can be circumvented in the linear regression model by working with the marginal likelihood of the models (which is available analytically for commonly used prior distributions), but this is not possible in other generalized linear models. Designing MCMC algorithms for Bayesian variable selection which mix well is a computationally challenging task if p is large and a large literature has developed around different approaches

Page 2 of 11
Generalized linear models and Bayesian variable selection
Computational approaches
Data augmentation
Page 4 of 11
Laplace approximation
Correlated pseudo-marginal sampler
Page 6 of 11
Page 8 of 11
Accelerated failure time modelling
Discussion
Page 10 of 11
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.