An Efficient Parallel Implementation of the Ensemble Kalman Filter Based on Shrinkage Covariance Matrix Estimation

Elías D Niño-Ruiz ,Adrian Sandu

doi:10.1109/hipcw.2015.17

Abstract

This paper develops a parallel implementation of the ensemble Kalman filter (EnKF) based on shrinkage covariancematrix estimation. The EnKF is a sequential Monte Carlo method for parameter and state estimation of highly nonlinearmodels. In the context of ensemble based methods, background error correlations are estimated via anensemble of model realizations. In practice, the model dimension (number of model components) is several timeslarger than the ensemble size owing to the computational effort involved in one single model propagation. Thisconstraint yields to spurious correlations (correlation between distant model components in space) in the backgrounderror estimations. Commonly, localization methods are utilized in order to reduce the impact of spuriouscorrelations. For instance, correlations of model components are reduced based on some function of their physicaldistances. Some EnKF implementations are formulated in such manner that, localization methods are implicitlyinvolved in the assimilation of observations. In this context, one of the best ensemble Kalman filter implementationsis the local ensemble transform Kalman filter (LETKF). In the LETKF, each model component is surrounded by alocal box of size r and then, the information contained within the local domain (i.e., observed components) is used toperform the assimilation. However, for sparse observational networks, r can become sufficiently large with respectto the ensemble size and therefore, the local analysis corrections can be impacted by spurious correlations. Wepropose an ensemble Kalman filter implementation in which the background error covariance matrix is estimated viathe Rao-Blackwell Ledoit and Wolf (RBLW) estimator. This estimator has proven to be better conditioned than thetrue (background) error covariance matrix. In addition, the explicit computation of the RBLW estimator is notrequired in our formulation. This implies enormous savings in terms of memory considering the typical order ofbackground error covariance matrices O(10e8) Furthermore, the proposed EnKF implementation can be efficientlyperformed in parallel: each model component is surrounded by a local box, a local background error covariancematrix is estimated based on the RBLW estimator and then, the ensemble Kalman filter equations are locally used inorder to compute the local analysis corrections. After the local assimilation, each model component is mapped backonto the full model domain from where the global analysis solution is obtained. In operational data assimilation, EnKF implementations are attractive owing to their potential for parallelization. Experiments are performed making use of the Atmospheric General Circulation Model SPEEDY. The proposed method is coded in FORTRAN and MPI is used for the local assimilation of observations. The predicted variables are the zonal and meridional wind components, the temperature and the specific humidity. For each model variable, 8 layers with the T-63 resolution (192 x 96 grid points) are considered which provide a total number of 589,824 model components. The ensemble size is 94. The number of processors utilized are 96, 256, 512, 768, 1024 and 1536. The results reveal that, the use of the RBLW estimator can mitigate the impact of spurious correlations under sparse observational networks and even more, for different number of processors, the computational effort of the proposed EnKF implementation is comparable to that of the LETKF where no covariance estimation is performed. In terms of accuracy, the proposed implementation performs better than the LETKF formulation in the Root Mean Square Error sense.

Full Text