Abstract

Many applications in Bayesian statistics are extremely computationally intensive. However, they are often inherently parallel, making them prime targets for modern massively parallel processors. Multi-core and distributed computing is widely applied in the Bayesian community, however, very little attention has been given to fine-grain parallelisation using single instruction multiple data (SIMD) operations that are available on most modern CPUs. In this work, we practically demonstrate, using standard programming libraries, the utility of the SIMD approach for several topical Bayesian applications. Using the C programming language, we show that SIMD can improve the single-core floating point arithmetic performance by up to a factor of 6× compared scalar C code and more than 25× compared with optimised R code. Such improvements are multiplicative to any gains achieved through multi-core processing. We illustrate the potential of SIMD for accelerating Bayesian computations and provide the reader with techniques for exploiting modern massively parallel processing environments.

Highlights

  • Practical applications in Bayesian statistics are computationally challenging since the posterior density is only known up to a normalising constant

  • We provide a practical demonstration of the computational benefits of directly accessing central processing units (CPUs) single instruction multiple data (SIMD) operations for approximate Bayesian computation (ABC)-based inference

  • We begin with R implementations, step through relevant optimisation within R, demonstrate the computational advantages of direct SIMD access using C and OpenMP

Read more

Summary

Introduction

Practical applications in Bayesian statistics are computationally challenging since the posterior density is only known up to a normalising constant. Code optimisation for hardware acceleration is standard in the high performance computing (HPC) discipline, and Bayesian practitioners can benefit from these techniques (Gillespie and Lovelace, 2017). Likelihood-free methods, such as approximate Bayesian computation (ABC) require a large number of independent prior predictive samples that can be executed in parallel. General purpose graphics processing units (GPGPUs) are highly effective at accelerating advanced Monte Carlo schemes (Klingbeil et al, 2011; Lee et al, 2010a). SIMD is widely available in modern CPUs containing vector processing units (VPUs). Our guidelines are directly applicable to the Julia language (Bezanson et al, 2017) and to external interfaces to pre-compiled C code, such as Matlab C-MEX and Rcpp combined with RcppXsimd (see Section 5)

Vectorisation with SIMD
Vectorisation and multithreading with OpenMP
Memory access and alignment
Performance analysis
A note on random number generation
Summary
A practical tutorial demonstration for R users
Prior predictive sampling for approximate Bayesian computation
Example model: a genetic toggle switch
Implementation and optimisation using R
Optimisation using C and SIMD operations
Benchmarks
Case study 1: weakly informative priors
9: Estimate p-value samples pik
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call