High-Resolution Spatiotemporal Forecasting with Missing Observations Including an Application to Daily Particulate Matter 2.5 Concentrations in Jakarta Province, Indonesia

I Gede Nyoman Mindra Jaya,Henk Folmer

doi:10.3390/math12182899

Abstract

Accurate forecasting of high-resolution particulate matter 2.5 (PM2.5) levels is essential for the development of public health policy. However, datasets used for this purpose often contain missing observations. This study presents a two-stage approach to handle this problem. The first stage is a multivariate spatial time series (MSTS) model, used to generate forecasts for the sampled spatial units and to impute missing observations. The MSTS model utilizes the similarities between the temporal patterns of the time series of the spatial units to impute the missing data across space. The second stage is the high-resolution prediction model, which generates predictions that cover the entire study domain. The second stage faces the big N problem giving rise to complex memory and computational problems. As a solution to the big N problem, we propose a Gaussian Markov random field (GMRF) for innovations with the Matérn covariance matrix obtained from the corresponding Gaussian field (GF) matrix by means of the stochastic partial differential equation (SPDE) method and the finite element method (FEM). For inference, we propose Bayesian statistics and integrated nested Laplace approximation (INLA) in the R-INLA package. The above approach is demonstrated using daily data collected from 13 PM2.5 monitoring stations in Jakarta Province, Indonesia, for 1 January–31 December 2022. The first stage of the model generates PM2.5 forecasts for the 13 monitoring stations for the period 1–31 January 2023, imputing missing data by means of the MSTS model. To capture temporal trends in the PM2.5 concentrations, the model applies a first-order autoregressive process and a seasonal process. The second stage involves creating a high-resolution map for the period 1–31 January 2023, for sampled and non-sampled spatiotemporal units. It uses the MSTS-generated PM2.5 predictions for the sampled spatiotemporal units and observations of the covariate’s altitude, population density, and rainfall for sampled and non-samples spatiotemporal units. For the spatially correlated random effects, we apply a first-order random walk process. The validation of out-of-sample forecasts indicates a strong model fit with low mean squared error (0.001), mean absolute error (0.037), and mean absolute percentage error (0.041), and a high R² value (0.855). The analysis reveals that altitude and precipitation negatively impact PM2.5 concentrations, while population density has a positive effect. Specifically, a one-meter increase in altitude is linked to a 7.8% decrease in PM2.5, while a one-person increase in population density leads to a 7.0% rise in PM2.5. Additionally, a one-millimeter increase in rainfall corresponds to a 3.9% decrease in PM2.5. The paper makes a valuable contribution to the field of forecasting high-resolution PM2.5 levels, which is essential for providing detailed, accurate information for public health policy. The approach presents a new and innovative method for addressing the problem of missing data and high-resolution forecasting.

Full Text