ABSTRACTThere is an increasing prevalence of streaming data generation in diverse fields like healthcare, finance, social media, and weather forecasting. In order to acquire helpful insights from these massive datasets, timely analysis is essential. In this article, we assume that the streaming data are analysed in batches. Traditional offline methods, which involve storing and analysing all individual records, can be repeatedly applied to the cumulative data, but encounter significant challenges in storage and computing costs. Existing online methods offer faster approximations but most methods neglect model uncertainty, causing overconfidence and instability. To bridge this gap, we propose novel online Bayesian approaches that incorporate model uncertainty within a Bayesian model averaging (BMA) framework, for generalized linear models (GLMs). We propose computationally efficient methods to update the posterior, with individual records from the latest batch of data and summary statistics from previous batches. We demonstrate using simulation studies and real data that our methods can offer much faster analysis compared to traditional methods, with no substantial drop in accuracy.
Read full abstract