Abstract
AbstractAnalysis of high‐resolution data offers greater opportunity to understand the nature of data variability, behaviours, trends and to detect small changes. Climate studies often require complete time series data which, in the presence of missing data, means imputation must be undertaken. Research on the imputation of high‐resolution temporal climate time series data is still at an early phase. In this study, multiple approaches to the imputation of missing values were evaluated, including a structural time series model with Kalman smoothing, an autoregressive integrated moving average (ARIMA) model with Kalman smoothing and multiple linear regression. The methods were applied to complete subsets of data from 12 month time series of hourly temperature, humidity and wind speed data from four locations along the coast of Western Australia. Assuming that observations were missing at random, artificial gaps of missing observations were studied using a five‐fold cross‐validation methodology with the proportion of missing data set to 10%. The techniques were compared using the pooled mean absolute error, root mean square error and symmetric mean absolute percentage error. The multiple linear regression model was generally the best model based on the pooled performance indicators, followed by the ARIMA with Kalman smoothing. However, the low error values obtained from each of the approaches suggested that the models competed closely and imputed highly plausible values. To some extent, the performance of the models varied among locations. It can be concluded that the modelling approaches studied have demonstrated suitability in imputing missing data in hourly temperature, humidity and wind speed data and are therefore recommended for application in other fields where high‐resolution data with missing values are common.
Highlights
Climatic conditions such as precipitation, temperature, humidity, wind speed, wind gust and sea level pressure have been used over time in many meteorological, energy application, agricultural, ecological and hydrological studies (Firat et al, 2012; Xu et al, 2013; Lara-Estrada et al, 2018)
This study evaluated the performance of univariate time series models by state-space methods and multiple linear regression models driven by other climatic variables to impute missing values in a 12 month time series of hourly measurements from four locations in Western Australia (WA)
The performance indicators agreed in the selection of the best imputation method in the five-fold gaps of missing observations
Summary
Climatic conditions such as precipitation, temperature, humidity, wind speed, wind gust and sea level pressure have been used over time in many meteorological, energy application, agricultural, ecological and hydrological studies (Firat et al, 2012; Xu et al, 2013; Lara-Estrada et al, 2018). The threats of global warming and climate change (World Bank, 2012) have sparked a resurgent interest in the analysis and inference of climatic variables and related subjects in the natural, social and political sciences. Processing and analysing data at a high-resolution scale such as h minutes (h ≤ 60) results in the availability of an appreciable number of points even when the overall time period under investigation is short; e.g. a (leap) year-long hourly time series consists of 8,784 data points. The analysis of high-resolution data offers greater ability to understand the nature of data variability, behaviours, trends and to detect small changes (Pincetl et al, 2015). In effect, missing observations in climate data often occur consecutively for long periods of time (Simolo et al, 2010)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.