Variable Selection and Interaction Detection with Bayesian Additive Regression Trees

Carlos M Carvalho,Robert E Mcculloch,Edward I George,P Richard Hahn

doi:10.1201/9781003089018-17

Abstract

Bayesian Additive Regression Trees (BART) has emerged as a highly effective Bayesian approach to ensemble modeling with many binary trees. The BART Markov Chain Monte Carlo (MCMC) algorithm provides effective stochastic search in a complex model space and Bayesian uncertainty. As is the case with many modern approaches, the overall complexity of the model makes interpretation difficult. In practice, investigators often wish to know what predictor variables are important or, more generally, which roles variables play in the model. In this chapter we review some approaches for understanding how variables enter the BART model. We present simple ways to find out which variables are most important, which pairs of variables interact in the model, and which subsets of variables allow us to approximate the full information inference according to a user defined metric. In all cases, our approaches are based on post processing the output from basic BART modeling, naturally capturing the uncertainty by the usual MCMC variation in a straightforward way.

Full Text