Subsampled Information Criteria for Bayesian Model Selection in the Big Data Setting

Lijiang Geng,Yishu Xue,Guanyu Hu

doi:10.1109/bigdata47090.2019.9006275

Abstract

Bayesian methods face unprecedented challenges in the era of big data, as the evaluation of likelihood in each iteration is computationally intensive. To deal with this bottleneck, recent literature focus mostly on speeding up Markov chain Monte Carlo (MCMC). Model selection, which is an important topic, has not received much attention. In the Bayesian context, deviance-based criteria, such as the deviance information criterion (DIC), are well-known for model selection purposes. In this article, we introduce the subsampled DIC and the subsampled information criterion IC in the big data context. Extensive simulation studies are conducted to evaluate the empirical performance of the proposed criterion. The usage of our proposed criterion is further illustrated with an analysis of the Covertype dataset.

Full Text