Tile low-rank approximations of non-Gaussian space and space-time Tukey g-and-h random field likelihoods and predictions on large-scale systems

Hatem Ltaief,Ying Sun,Marc G. Genton,Sagnik Mondal,David E. Keyes,Sameh Abdulah

doi:10.1016/j.jpdc.2023.104715

Abstract

Large-scale statistical modeling has become necessary with the vast flood of geospace data coming from various sources. In space statistics, the Maximum Likelihood Estimation (MLE) is widely considered for modeling geospace data by estimating a set of statistical parameters related to a predefined covariance function. This covariance function describes the correlation between a set of geospace locations where the main goal is to model given data samples and impute missing data. Climate/weather modeling is a prevalent application for the MLE operation where data interpolation and forecasting are highly required. In the literature, the Gaussian random field is often used to describe geospace data as one of the most popular models for MLE. However, real-life datasets are often skewed and/or have extreme values, and non-Gaussian random field models are more appropriate for capturing such features. In this work, we provide an exact and approximate parallel implementation of the well-known Tukey g-and-h (TGH) non-Gaussian random field in the context of climate/weather applications. The proposed implementation alleviates the computation complexity of the log-likelihood function, which requires O(n2) storage and O(n3) operations, where N is the number of geospace locations, M is the number of time slots, and n=N×M. Based on tile low-rank (TLR) approximations, our implementation of the TGH model can tackle large-scale problems. Furthermore, we rely on task-based programming models and dynamic runtime systems to provide fast execution for the MLE operation in space and space-time cases. We assess the performance and accuracy of the proposed implementations using synthetic space and space-time datasets up to 800K. We also consider a 12-month precipitation dataset in Germany to demonstrate the advantage of using non-Gaussian over Gaussian random field models. We evaluate the prediction accuracy of the TGH model on the precipitation dataset using the Probability Integral Transformation (PIT) tool showing that the TGH model outperforms the Gaussian modeling in the real dataset. Moreover, our performance assessment indicates that TLR computations allow solving larger matrix sizes while preserving the required accuracy for prediction. The TLR-based approximation shows a speedup up to 7.29X and 2.96X over the exact solution.

Full Text