Accurate estimation of reference evapotranspiration (ET0) is very important in hydrological cycle research, and is essential in agricultural water management and allocation. The application of the standard model (FAO-56 Penman-Monteith) to estimate ET0 is restricted due to the absence of required meteorological data. Although many machine learning algorithms have been applied in modeling ET0 with fewer meteorological variables, most of the models are trained and tested using data from the same station, their performances outside the training station are not evaluated. This study aims to investigate generalization ability of the random forest (RF) algorithm in modeling ET0 with different input combinations (refer to different circumstances in missing data), and compares this algorithm with the gene-expression programming (GEP) method using the data from 24 weather stations in a karst region of southwest China. The ET0 estimated by the FAO-56 Penman-Monteith model was used as a reference to evaluate the derived RF-based and GEP-based models, and the coefficient of determination (R2), Nash-Sutcliffe coefficiency of efficiency (NSCE), root of mean squared error (RMSE), and percent bias (PBIAS) were used as evaluation criteria. The results revealed that the derived RF-based generalization ET0 models are successfully applied in modeling ET0 with complete and incomplete meteorological variables (R2, NSCE, RMSE and PBIAS ranged from 0.637 to 0.987, 0.626 to 0.986, 0.107 to 0.563 mm day−1, and −2.916% to 1.571%, respectively), and seven RF-based models corresponding to different incomplete data circumstances are proposed. The GEP-based generalization ET0 models are also proposed, and they produced promising results (R2, NSCE, RMSE and PBIAS ranged from 0.639 to 0.944, 0.636 to 0.942, 0.222 to 0.555 mm day−1, and −1.98% to 0.248%, respectively). Although the RF-based ET0 models performed slightly better than the GEP-based models, the GEP approach has the ability to give explicit expressions between the dependent and independent variables, which is more convenient for irrigators with minimal computer skills. Therefore, we recommend applying the RF-based models in water balance research, and the GEP-based models in agricultural irrigation practice. Moreover, the models performance decreased with periods due to climate change impact on ET0. At last, both of the two methods have the ability to assess the importance of predictors, the order of the importance of meteorological variables on ET0 in Guangxi is: sunshine duration, air temperature, relative humidity, and wind speed.
Read full abstract