Abstract
We consider parallel computation for Gaussian process calculations to overcome computational and memory constraints on the size of datasets that can be analyzed. Using a hybrid parallelization approach that uses both threading (shared memory) and message-passing (distributed memory), we implement the core linear algebra operations used in spatial statistics and Gaussian process regression in an R package called bigGP that relies on C and MPI. The approach divides the matrix into blocks such that the computational load is balanced across processes while communication between processes is limited. The package provides an API enabling R programmers to implement Gaussian process-based methods by using the distributed linear algebra operations without any C or MPI coding. We illustrate the approach and software by analyzing an astrophysics dataset with n=67,275 observations.
Highlights
Gaussian processes are widely used in statistics and machine learning for spatial and spatiotemporal modeling [Banerjee et al, 2003], design and analysis of computer experiments [Kennedy and O’Hagan, 2001], and non-parametric regression [Rasmussen and Williams, 2006]
One popular example is the spatial statistics method of kriging, which is equivalent to conditional expectation under a Gaussian process model for the unknown spatial field
As a result of the computational and memory limitations, standard spatial statistics methods are typically applied to datasets with at most a few thousand observations
Summary
Gaussian processes are widely used in statistics and machine learning for spatial and spatiotemporal modeling [Banerjee et al, 2003], design and analysis of computer experiments [Kennedy and O’Hagan, 2001], and non-parametric regression [Rasmussen and Williams, 2006]. As a result of the computational and memory limitations, standard spatial statistics methods are typically applied to datasets with at most a few thousand observations To overcome these limitations, a small industry has arisen to develop computationallyefficient approaches to spatial statistics, involving reduced rank approximations [Kammann and Wand, 2003, Banerjee et al, 2008, Cressie and Johannesson, 2008], tapering the covariance matrix to induce sparsity [Furrer et al, 2006, Kaufman et al, 2008], approximation of the likelihood [Stein et al, 2004], and fitting local models by stratifying the spatial domain [Gramacy and Lee, 2008], among others. We present an algorithm and R package, bigGP, for distributed linear algebra calculations focused on those used in spatial statistics and closely-related Gaussian process regression methods. We illustrate the use of the software for Gaussian process regression in an astrophysics application
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.