Enforcing sparsity constraints has been shown to be an effective and efficient way to obtain state-of-the-art results in regression and classification tasks. Unlike the support vector machine (SVM) the relevance vector machine (RVM) explicitly encodes the criterion of model sparsity as a prior over the model weights. However the lack of an explicit prior structure over the weight variances means that the degree of sparsity is to a large extent controlled by the choice of kernel (and kernel parameters). This can lead to severe overfitting or oversmoothing—possibly even both at the same time (e.g. for the multiscale Doppler data). We detail an efficient scheme to control sparsity in Bayesian regression by incorporating a flexible noise-dependent smoothness prior into the RVM. We present an empirical evaluation of the effects of choice of prior structure on a selection of popular data sets and elucidate the link between Bayesian wavelet shrinkage and RVM regression. Our model encompasses the original RVM as a special case, but our empirical results show that we can surpass RVM performance in terms of goodness of fit and achieved sparsity as well as computational performance in many cases. The code is freely available.
Read full abstract