Divide-and-conquer Bayesian methods consist of three steps: dividing the data into smaller computationally manageable subsets, running a sampling algorithm in parallel on all the subsets, and combining parameter draws from all the subsets. These methods use the combined parameter draws for efficient posterior inference in massive data settings. Existing divide-and-conquer methods have a major limitation in that their first two steps assume that the observations are independent. We address this problem by developing a divide-and-conquer method for Bayesian inference in parametric hidden Markov models, where the state space is known and finite. Our main contributions are two-fold. First, we partition the data into smaller blocks of consecutive observations and modify the likelihood on every time block. For any time block, the posterior distribution computed using the modified likelihood is such that its variance has the same asymptotic order as that of the true posterior. Second, suppose the number of subsets is chosen appropriately depending on the mixing properties of the hidden Markov chain. In that case, we show that the subset posterior distributions defined using the modified likelihood are asymptotically normal as the subset sample size tends to infinity. This result facilitates using any existing combination algorithm in the third step. We show that the combined posterior distribution obtained using one such algorithm is close to the true posterior distribution in 1-Wasserstein distance under widely used regularity assumptions. Our numerical results show that the proposed method provides an accurate approximation of the true posterior distribution than its competitors in simulation studies and a real data analysis.
Read full abstract