Abstract

Demand for electricity is gradually increasing in many countries. Efforts in related studies have been made for the application of data mining techniques over related electric power data for the development of more effective energy management systems. However, one major challenge is how to compensate for parts of the collected dataset, such as power consumption, voltage, or electric current that may be missing for a specific period of time. In the literature, several methods have been employed for imputation of the missing data, especially single feature value imputation. However, the performance of the different types of imputation methods, i.e. statistical and machine learning methods, for multiple missing features of electric power data has not been fully explored. Moreover, variations in their imputation performance during the summer/non-summer seasons and in the peak/off-peak/semi-peak times have not been investigated. In this paper, the performance of five well-known imputation methods for processing electric power data, two statistical methods, autoregressive integrated moving average (ARIMA) and linear interpolation (LI) models, and three machine learning methods, k-nearest neighbor (K-NN), multilayer perceptron (MLP), and support vector regression (SVR) is compared. The experimental results, based on electric power data for a two-year period in Taiwan, show that the machine learning methods generally perform better than the statistical ones, with K-NN and SVR performing the best. In particular, all of the imputation methods produced higher error rates during the summer season than the non-summer seasons. Moreover, the machine learning methods (especially K-NN) are better choices for the imputation of missing data during peak times, whereas the statistical methods (especially LI) are better for off-peak and semi-peak times.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call