Scalable Gaussian Process Research Articles

One of the most challenging problems in Gaussian process regression is to cope with large-scale datasets and to tackle an online learning setting where data instances arrive irregularly and continuously. In this paper, we introduce a novel online Gaussian process model that scales efficiently with large-scale datasets. Our proposed GoGP is constructed based on the geometric and optimization views of the Gaussian process regression, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always offers a sparse solution, which can approximate the true optima up to any level of precision specified a priori. Moreover, to further speed up the GoGP accompanied with a positive semi-definite and shift-invariant kernel such as the well-known Gaussian kernel and also address the curse of kernelization problem, wherein the model size linearly rises with data size accumulated over time in the context of online learning, we proposed to approximate the original kernel using the Fourier random feature kernel. The model of GoGP with Fourier random feature (i.e., GoGP-RF) can be stored directly in a finite-dimensional random feature space, hence being able to avoid the curse of kernelization problem and scalable efficiently and effectively with large-scale datasets. We extensively evaluated our proposed methods against the state-of-the-art baselines on several large-scale datasets for online regression task. The experimental results show that our GoGP(s) delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared with its rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors.

AbstractBiomass monitoring is vital for studying the carbon cycle of earth's ecosystem and has several significant implications, especially in the context of understanding climate change and its impacts. Recently, several change detection methods have been proposed to identify land cover changes in temporal profiles (time series) of vegetation collected using remote sensing instruments, but do not satisfy one or both of the two requirements of the biomass monitoring problem, that is, operating in online mode and handling periodic time series. In this paper, we adapt Gaussian process (GP) regression to detect changes in such time series in an online fashion. While GP have been widely used as a kernel‐based learning method for regression and classification, their applicability to massive spatiotemporal data sets, such as remote sensing data, has been limited owing to the high computational costs involved. We focus on addressing the scalability issues associated with the proposed GP based change detection algorithm. This paper makes several significant contributions. First, we propose a GP based online time series change detection algorithm and demonstrate its effectiveness in detecting different types of changes in Normalized Difference Vegetation Index (NDVI) data obtained from a study area in IA, USA. Second, we propose an efficient Toeplitz matrix based solution which significantly improves the computational complexity and memory requirements of the proposed GP based method. Specifically, the proposed solution can analyze a time series of length t in O(t2) time while maintaining a O(t) memory footprint, compared to the O(t3) time and O(t2) memory requirement of standard matrix manipulation based methods. Third, we describe a parallel version of the proposed solution which can be used to simultaneously analyze a large number of time series. We study three different parallel implementations: using threads, Message Passing Interface (MPI), and a hybrid implementation using threads and MPI. Experimental results show that the hybrid implementation scales better than the multithreaded and MPI based implementations. The application of the proposed scalable algorithm is demonstrated in analyzing massive remote sensing observation data. The hybrid implementation, using 1536 computing cores, can analyze an NDVI data set for the Iowa study area in nearly 5 s, while a serial algorithm, using standard Cholesky decomposition based routines, takes several days to process the same data set. © 2011 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 4: 430–445, 2011

Scalable Gaussian Process Research Articles

Articles published on Scalable Gaussian Process

Scalable Gaussian processes for predicting the optical, physical, thermal, and mechanical properties of inorganic glasses with large datasets

Wireless Traffic Prediction With Scalable Gaussian Process: Framework, Algorithms, and Verification

Understanding and comparing scalable Gaussian process regression for big data

GoGP: scalable geometric-based Gaussian process for online regression

Stochastic variational hierarchical mixture of sparse Gaussian processes for regression

Fast and Scalable Gaussian Process Modeling with Applications to Astronomical Time Series

On nearest-neighbor Gaussian process models for massive spatial data.

Evaluation of machine learning interpolation techniques for prediction of physical properties

A scalable gaussian process analysis algorithm for biomass monitoring

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Scalable Gaussian Process Research Articles

Articles published on Scalable Gaussian Process

Scalable Gaussian processes for predicting the optical, physical, thermal, and mechanical properties of inorganic glasses with large datasets

Wireless Traffic Prediction With Scalable Gaussian Process: Framework, Algorithms, and Verification

Understanding and comparing scalable Gaussian process regression for big data

GoGP: scalable geometric-based Gaussian process for online regression

Stochastic variational hierarchical mixture of sparse Gaussian processes for regression

Fast and Scalable Gaussian Process Modeling with Applications to Astronomical Time Series

On nearest-neighbor Gaussian process models for massive spatial data.

Evaluation of machine learning interpolation techniques for prediction of physical properties

A scalable gaussian process analysis algorithm for biomass monitoring