Accurate and fast estimation of reference evapotranspiration (ET0) is important in determining crop water requirements, designing irrigation schedule, planning and managing agricultural water resources, especially when limited meteorological data are available. This study proposed a novel kernel extreme learning machine model coupled with the K-means clustering and firefly algorithms (Kmeans-FFA-KELM) with 5, 10, 15, 20, 25, 30 and 40 data subsets for estimating monthly mean daily ET0 in parallel computation in the Poyang Lake basin of South China with pooled temperature data from 26 weather stations. Two input combinations, i.e. (1) mean temperature (Tavg) and extraterrestrial radiation (Ra), (2) maximum and minimum temperatures (Tmax and Tmin) and Ra, were considered. Meteorological data during 1966–2000 were used to train the models, while those for the period 2001–2015 were used for model testing. The results showed that the prediction accuracy of selected machine learning models with Tmax, Tmin and Ra was improved by 7.0–15.5% in terms of RMSE compared to that with Tavg and Ra during testing. The FFA-KELM model slightly outperformed the adaptive network based fuzzy inference system (ANFIS) model, both of which were superior to the random forest (RF) and M5 prime model tree (M5P) models, followed by the Hargreaves and Thornthwaite models. The RMSE values of Kmeans-FFA-KELM models with more than 20 subsets were decreased by 0.7–3.5% compared with those of the FFA-KELM models. The Kmeans-FFA-KELM model with 25 subsets (FFA-KELM-25) outperformed the FFA-KELM model in summer and in the count of absolute errors greater than 0.9 mm d−1. The computational time of Kmeans-FFA-KELM models first decreased and then increased with the increase of the subset number. The parallel FFA-KELM-25 model (0.5–0.7 s) significantly reduced the computational time, which was 10–13 times faster than the sequential Kmeans-FFA-KELM model (7.0–7.4 s), and 1185–1603 times faster than the FFA-KELM model (802.2–830.0 s). This study provides a new and fast modeling method for processing large datasets in agricultural and water resources studies on a regional scale.
Read full abstract