In this paper, ensemble-based machine learning models with gradient boosting machine and random forest are proposed for predicting the power production from six different solar PV systems. The models are based on three year’s performance of a 1.2 MW grid-integrated solar photo-voltaic (PV) power plant. After cleaning the data for errors and outliers, the model features were chosen on the basis of principal component analysis. Accuracies of the developed models were tested and compared with the performance of models based on other supervised learning algorithms, such as k-nearest neighbour and support vector machines. Though the accuracies of the models varied with the type of PV systems, in general, the machine learned models developed under the study could perform well in predicting the power output from different solar PV technologies under varying working environments. For example, the average root mean square error of the models based on the gradient boosting machines, random forest, k-nearest neighbour, and support vector machines are 17.59 kW, 17.14 kW, 18.74 kW, and 16.91 kW, respectively. Corresponding averages of mean absolute errors are 8.28 kW, 7.88 kW, 14.45 kW, and 6.89 kW. Comparing the different modelling methods, the decision-tree-based ensembled algorithms and support vector machine models outperformed the approach based on the k-nearest neighbour method. With these high accuracies and lower computational costs compared with the deep learning approaches, the proposed ensembled models could be good options for PV performance predictions used in real and near-real-time applications.
Read full abstract