Background Global mean surface temperature is widely used in the climate literature as a measure of the impact of human activity on the climate system. While the concept of a spatial average is simple, the estimation of that average from spatially incomplete data is not. Correlation between nearby map grid cells means that missing data cannot simply be ignored. Estimators that (often implicitly) assume uncorrelated observations can be biased when naively applied to the observed data, and in particular, the commonly used area weighted average is a biased estimator under these circumstances. Some surface temperature products use different forms of infilling or imputation to estimate temperatures for regions distant from the nearest observation, however the impacts of such methods on estimation of the global mean are not necessarily obvious or themselves unbiased. This issue was addressed in the 1970s by Ruvim Kagan, however his work has not been widely adopted, possibly due to its complexity and dependence on subjective choices in estimating the dependence between geographically proximate observations. Objectives The aim of this work is to present a simple estimator for global mean surface temperature from spatially incomplete data which retains many of the benefits of the work of Kagan, while being fully specified by two equations and a single parameter. The main purpose of the simplified estimator is to better explain to users of temperature data the problems associated with estimating an unbiased global mean from spatially incomplete data, however the estimator may also be useful for problems with specific requirements for reproducibility and performance. Methods The new estimator is based on generalized least squares, and uses the correlation matrix of the observations to weight each observation in accordance with the independent information it contributes. It can be implemented in fewer than 20 lines of computer code. The performance of the estimator is evaluated for different levels of observational coverage using reanalysis data with artificial noise. Results For recent decades the generalized least squares estimator mitigates most of the error associated with the use of a naive area weighted average. The improvement arises from the fact that coverage bias in the historical temperature record does not arise from an absolute shortage of observations (at least for recent decades), but rather from spatial heterogeneity in the distribution of observations, with some regions being relatively undersampled and others oversampled. The estimator addresses this problem by reducing the weight of the oversampled regions, in contrast to some existing global temperature datasets which extrapolate temperatures into the unobserved regions. The results are almost identical to the use of kriging (Gaussian process interpolation) to impute missing data to global coverage, followed by an area weighted average of the resulting field. However, the new formulation allows direct diagnosis of the contribution of individual observations and sources of error. Conclusions More sophisticated solutions to the problem of missing data in global temperature estimation already exist. However the simple estimator presented here, and the error analysis that it enables, demonstrate why such solutions are necessary. The 2013 Fifth Assessment Report of the Intergovernmental Panel on Climate Change discussed a slowdown in warming for the period 1998-2012, quoting the trend in the HadCRUT4 historical temperature dataset from the United Kingdom Meteorological Office in collaboration with the Climatic Research Unit of the University of East Anglia, along with other records. Use of the new estimator for global mean surface temperature would have reduced the apparent slowdown in warming of the early 21st century by one third in the spatially incomplete HadCRUT4 product.
Read full abstract