Abstract Accurate predictive daily global horizontal irradiation models are essential for diverse solar energy applications. Their long-term performances can be assessed using average years. This study scrutinized 70 machine learning and 44 empirical models using two disjoint five-year average daily training and validation datasets, each comprising 365 records and 10 features. The features included day number, minimum and maximum air temperature, air temperature amplitude, theoretical and observed sunshine hours, theoretical extraterrestrial horizontal irradiation, relative sunshine, cloud cover and relative humidity. Fourteen machine learning algorithms, namely, multiple linear regression, ridge regression, lasso regression, elastic net regression, Huber regression, k-nearest neighbors, decision tree, support vector machine, multilayer perceptron, extreme learning machine, generalized regression neural network, extreme gradient boosting, gradient boosting machine and light gradient boosting machine were trained, validated and instantiated as base learners in 4 strategically designed homogeneous parallel ensembles—variants of pasting, random subspace, bagging and random patches—which also were scrutinized, producing 70 models. Specific hyperparameters from the algorithms were optimized. Validation showed that at least two ensembles outperformed its individual model. Huber-subspace ranked first with a root mean square error of 1.495 MJ/m2/day. The multilayer perceptron was most robust to the random perturbations of the ensembles which extrapolates to good tolerance to ground truth data noise. The best empirical model returned a validation root mean square error of 1.595 MJ/m2/day but was outperformed by 93% of the machine learning models with the homogeneous parallel ensembles producing superior predictive accuracies.
Read full abstract