Abstract

Historical observations of temperature underpin our ability to monitor Earth's climate. We identify a pervasive issue in archived observations from surface stations, wherein the use of varying conventions for units and precision has led to distorted distributions of the data. Apart from the original precision being generally unknown, the majority of archived temperature data are found to be misaligned with the original measurements because of rounding on a Fahrenheit scale, conversion to Celsius, and re‐rounding. Furthermore, we show that commonly used statistical methods including quantile regression are sensitive to the finite precision and to double‐rounding of the data after unit conversion. To remedy these issues, we present a Hidden Markov Model that uses the differing frequencies of specific recorded values to recover the most likely original precision and units associated with each observation. This precision‐decoding algorithm is used to infer the precision of the 644 million daily surface temperature observations in the Global Historical Climate Network database, providing more accurate values for the 63% of samples found to have been biased by double‐rounding. The average absolute bias correction across the dataset is 0.018 °C, and the average inferred precision is 0.41 °C, even though data are archived at 0.1 °C precision. These results permit better inference of when record temperatures occurred, correction of rounding effects, and identification of inhomogeneities in surface temperature time series, amongst other applications. The precision‐decoding algorithm is generally applicable to rounded observations–including surface pressure, humidity, precipitation, and other temperature data–thereby offering the potential to improve quality‐control procedures for many datasets.

Highlights

  • Temperature maxima and minima measured at surface weather stations are the primary observational source for information about the last two centuries of climate, forming the bulk of pre-satellite observations (Thorne et al, 2011)

  • The algorithm is general and can be applied to any time series, for purposes of specificity, we describe it with reference to temperature data from the Global Historical Climate Network Daily database (GHCND; Menne et al, 2012)

  • The analysis presented here is by no means an exhaustive study of precision variability within the GHCND database, and precision-decoding will be most useful when used in tandem with other quality-control tools

Read more

Summary

Introduction

Temperature maxima and minima measured at surface weather stations are the primary observational source for information about the last two centuries of climate, forming the bulk of pre-satellite observations (Thorne et al, 2011). 0.1°C 0.1°F substantial double-rounding effect can be seen in the frequency of each possible tenths digit in the GHCND data (Figure 1(a)), in which zeros are systematically overrepresented (comprising 14% of all samples) and fives are under-represented (comprising only 4% of samples). The inflated zero counts alone might be explained by digit preferences of station operators or by some observations having originally been 1.0 ◦C-precision, but the under-representation of fives suggests a further effect given the large sample size and the relatively uniform frequency of the remaining digits. The under-representation of fives (Figure 1(a)) can be understood as a consequence of the GHCND database containing some data that were originally of 1.0 ◦F-precision, a fact that has been previously noted (Zhang et al, 2009), but for which the consequences of double-rounding have not yet been assessed. We introduce an algorithm termed precision decoding which accurately infers the original precision and units of observations (section 2), and demonstrate its operation upon synthetic data (section 3) and present results and implications (section 4) as well as conclusions (section 5)

Precision decoding
Precision candidates and consistency subsets
Emission matrix
Parameter estimation
Validation of the algorithm
Results and discussion
GHCND data
Effects of unknown precision on quantile regression
Identification of record-breaking values
Improved quality control
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call