The theory of Bayesian learning incorporates the use of Student-t Processes to model heavy-tailed distributions and datasets with outliers. However, despite Student-t Processes having a similar computational complexity as Gaussian Processes, there has been limited emphasis on the sparse representation of this model. This is mainly due to the increased difficulty in modeling and computation compared to previous sparse Gaussian Processes. Our motivation is to address the need for a sparse representation framework that reduces computational complexity, allowing Student-t Processes to be more flexible for real-world datasets. To achieve this, we leverage the conditional distribution of Student-t Processes to introduce sparse inducing points. Bayesian methods and variational inference are then utilized to derive a well-defined lower bound, facilitating more efficient optimization of our model through stochastic gradient descent. We propose two methods for computing the variational lower bound, one utilizing Monte Carlo sampling and the other employing Jensen's inequality to compute the KL regularization term in the loss function. We propose adopting these approaches as viable alternatives to Gaussian processes when the data might contain outliers or exhibit heavy-tailed behavior, and we provide specific recommendations for their applicability. We evaluate the two proposed approaches on various synthetic and real-world datasets from UCI and Kaggle, demonstrating their effectiveness compared to baseline methods in terms of computational complexity and accuracy, as well as their robustness to outliers.
Read full abstract