Impact of missing data on the efficiency of homogenisation: experiments with ACMANTv3

Peter Domonkos,John Coll

doi:10.1007/s00704-018-2488-3

Abstract

The impact of missing data on the efficiency of homogenisation with ACMANTv3 is examined with simulated monthly surface air temperature test datasets. The homogeneous database is derived from an earlier benchmarking of daily temperature data in the USA, and then outliers and inhomogeneities (IHs) are randomly inserted into the time series. Three inhomogeneous datasets are generated and used, one with relatively few and small IHs, another one with IHs of medium frequency and size, and a third one with large and frequent IHs. All of the inserted IHs are changes to the means. Most of the IHs are single sudden shifts or pair of shifts resulting in platform-shaped biases. Each test dataset consists of 158 time series of 100 years length, and their mean spatial correlation is 0.68–0.88. For examining the impacts of missing data, seven experiments are performed, in which 18 series are left complete, while variable quantities (10–70%) of the data of the other 140 series are removed. The results show that data gaps have a greater impact on the monthly root mean squared error (RMSE) than the annual RMSE and trend bias. When data with a large ratio of gaps is homogenised, the reduction of the upper 5% of the monthly RMSE is the least successful, but even there, the efficiency remains positive. In terms of reducing the annual RMSE and trend bias, the efficiency is 54–91%. The inclusion of short and incomplete series with sufficient spatial correlation in all cases improves the efficiency of homogenisation with ACMANTv3.

Full Text