Abstract

The “3/5 rule” is a commonly used rule‐of‐thumb for dealing with missing data when calculating monthly climate normals. The rule states that any month that is missing more than three consecutive daily values, or more than five daily values in total, should not be included in calculated monthly climate normals. We quantify the impact of missing data in a given year–month for between 1 and 25 missing values. As such, we describe the error the “3/5 rule” (and a related rule that we have dubbed the “4/10 rule”) permits. We tested the statistical robustness of these rules using observed temperature data from a temperate station and a tropical station. We show that, for observed data, the “3/5 rule” permits an average of between 0.06 and 0.07 standard deviations of error in the calculated monthly mean (ɛ) when three consecutive or five random values are missing. For its part, the “4/10 rule” permits a maximum ɛ of between 0.07 and 0.09 when four consecutive values are missing, or up to 0.10 when 10 random values are missing. The proportional impact of missing values was similar across variables. We performed a correlation analysis and show that each additional missing value from a year–month of data increases ɛ by between 0.008 and 0.018 for up to 19 missing values. There is a significant relationship between the lag‐1 autocorrelation of a year–month, and ɛ. ɛ can be reduced by simple linear interpolation when values are missing at random and the year–month exhibits lag‐1 autocorrelation. Overall, we find that the application of any “rule‐of‐thumb” should be based on the particular characteristics of the source data and the goals of the research project.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call