Bayesian computing, including sampling probability distributions, learning graphic model, and Bayesian reasoning, is a powerful class of machine learning algorithms with such wide applications as biologic computing, financial analysis, natural language processing, autonomous driving, and robotics. The central pattern of Bayesian computing is the Markov Chain Monte Carlo (MCMC) computing, which is compute-intensive and lacks explicit parallelism. In this work, we propose a parallel MCMC Bayesian computing accelerator (PMBA) architecture. Designed as a probabilistic computing platform with native support for efficient single-chain parallel Metropolis-Hastings based MCMC sampling, PMBA boosts the performance of probabilistic programs with a massive-parallelism microarchitecture. PMBA is equipped with on-chip random number generators as the built-in source of randomness. The sampling units of PMBA are designed for parallel random sampling through a customized SIMD pipeline supporting data synchronization every iteration. A respective computing framework supporting automatic parallelization and mapping of probabilistic programs is also developed. Evaluation results demonstrate that PMBA enables a 17-21 folds speedup over a TITAN X GPU on MCMC sampling workload. On probabilistic benchmarks, PMBA outperforms prior best solutions by factor of 3.6 to 10.3. An exemplar based visual category learning algorithm is implemented on PMBA to demonstrate its efficiency and effectiveness for complex statistical learning problems.
Read full abstract