Earth-observing satellite instruments obtain a massive number of observations every day. Despite their size, such data sets are incomplete and noisy, necessitating spatial statistical inference to obtain complete, high-resolution fields with quantified uncertainties. Such inference is challenging due to the high computational cost and the nonstationary behavior of environmental processes on a global scale. Classical methods for constructing complete maps of environmental variables can be overly simplistic and result in opaque uncertainties, thereby limiting the scientific application of these data. In this work, we address the need to construct complete, high resolution maps of geophysical processes (Level 3 data) from incomplete, noisy, georeferenced observations (Level 2 data). We develop a multi-resolution approximation (M-RA) of a Gaussian process (GP) whose nonstationary, global covariance function is obtained using local fits. We consider sea surface temperature (SST) as an application to illustrate the workflow and our computational implementation. The M-RA requires domain partitioning, which can be set up application-specifically. In the SST case, we partition the domain purposefully to account for and weaken dependence across land barriers. Our M-RA implementation is tailored to distributed-memory computation in high-performance-computing environments. We analyze a Moderate Resolution Imaging Spectroradiometer (MODIS) SST data set consisting of more than 43 million observations, to our knowledge the largest dataset ever analyzed using a probabilistic GP model. We show that our nonstationary model based on local fits provides substantially improved predictive performance relative to a stationary approach, which already improves upon current methods.
Read full abstract