Abstract

This article considers the regression problem with sparse Bayesian learning (SBL) when the number of weights P is larger than the data size N, i.e., P》 N. The situation induces overfitting and makes regression tasks, such as prediction and basis selection, challenging. We show a strategy to address this problem. Our strategy consists of two steps. The first is to apply an inverse gamma hyperprior with a shape parameter close to zero over the noise precision of automatic relevance determination (ARD) prior. This hyperprior is associated with the concept of a weakly informative prior in terms of enhancing sparsity. The model sparsity can be controlled by adjusting a scale parameter of inverse gamma hyperprior, leading to the prevention of overfitting. The second is to select an optimal scale parameter. We develop an extended predictive information criterion (EPIC) for optimal selection. We investigate the strategy through relevance vector machine (RVM) with a multiple-kernel scheme dealing with highly nonlinear data, including smooth and less smooth regions. This setting is one form of the regression task with SBL in the P》 N situation. As an empirical evaluation, regression analyses on four artificial datasets and eight real datasets are performed. We see that the overfitting is prevented, while predictive performance may be not drastically superior to comparative methods. Our methods allow us to select a small number of nonzero weights while keeping the model sparse. Thus, the methods are expected to be useful for basis and variable selection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call