Abstract
Pan evaporation (Epan) of class A pan evaporimeter under local semi-arid conditions was modelled in this study based on meteorological observations as input data using an integrated regression approach that includes three steps: a) first step: appropriate selection of transformations for reducing normality departures of independent variables and ridge regression for selecting variables with low collinearity based on variance inflation factors, b) second step (RCV-REG): regression (REG) of the final model with selected transformed variables of low collinearity implemented using an iterative procedure called “Random Cross-Validation” (RCV) that splits multiple times the data in calibration and validation subsets considering a random selection procedure, c) robustness control of the estimated regression coefficients from RCV-REG by analyzing the sign (+ or -) variation of their iterative solutions using the 95% interval of their Highest Posterior Density Distribution (HPD). The iterative procedure of RCV can also be implemented on machine learning methods (MLs) and for this reason, the ML method of Random Forests (RF) was also applied with RCV (RCV-RF) as an additional case in order to be compared with RCV-REG. Random splitting of data into calibration and validation set (70% and 30%, respectively) was performed 1,000 times in RCV-REG and led to a respective number of solutions of the regression coefficients. The same number of iterations and random splitting for validation was also used in the RCV-RF. The results showed that RCV-REG outperformed RCV-RF at all model performance criteria providing robust regression coefficients associated to independent variables (constant signs of their 95% HPD interval) and better distribution of validation solutions in the iterative 1:1 plots from RCV-RF (RCV-RG: R2=0.843, RMSE=0.853, MAE=0.642, MAPE=0.081, NSE=0.836, Slope(1:1 plot)=0.998, Intercept(1:1 plot)=0.011, and RCV-RF: R2=0.835, RMSE=0.904, MAE=0.689, MAPE= 0.088, NSE=0.818, Slope(1:1 plot)=1.120, Intercept(1:1 plot)=-1.011, based on the mean values of 1,000 iterations). The use of RCV approach in various modelling approaches solves the problem of subjective splitting of data into calibration and validation sets, provides a better evaluation of the final modelling approaches and enhances the competitiveness of typical regression models against machine learning models.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.