Boosting prior knowledge in streaming variational Bayes

Duc Anh Nguyen,Van Linh Ngo,Kim Anh Nguyen,Canh Hao Nguyen,Khoat Than

doi:10.1016/j.neucom.2020.10.026

Abstract

Exploiting prior/human knowledge is an effective way to enhance Bayesian models, especially in cases of sparse or noisy data, for which building an entirely new model is not always possible. There is a lack of studies on the effect of external prior knowledge in streaming environments, where the data come sequentially and infinitely. In this work, we show the problem of vanishing prior knowledge in streaming variational Bayes. This is a serious drawback in various applications. We then develop a simple framework to boost the external prior when learning a Bayesian model from data streams. By boosting, the prior knowledge can be maintained and efficiently exploited through each minibatch of streaming data. We evaluate the performance of our framework in four scenarios: streaming in synthetic data, streaming sentiment analysis, streaming learning for latent Dirichlet allocation, and streaming text classification, in comparison with the methods that do not keep priors. From extensive experiments, we find that when provided good external knowledge, our framework can improve the performance of a Bayesian model, often by a significant margin for noisy and short text streams.

Full Text