Abstract
Machine learning has been widely used in hydrological modeling. However, the question of whether to use all data for modeling or only a specific subset for modeling and its implications are rarely investigated explicitly. As a case study, combining evapotranspiration (ET) observations from 168 flux stations, meteorological and biophysical variables, we used Random Forests to separately construct an 'All data' model trained with all data and 6 'plant functional type (PFT) specific' models trained with specific PFT data (i.e., Forest, Grassland, Cropland, Shrubland‚ Savannah, Wetland). We found ET simulations between different specific PFTs are transferable. The 'All data' model captured better ET and had a higher R-squared at 94 of 168 sites, especially in Wetland, Shrubland, Cropland, and Grassland types. Compared to using the 'All data' model, the 'PFT specific' model can further improve the accuracy in high R-squared grassland sites by reducing the effect of confusion of other PFTs and constraining the variance of the training data. When shifting from the 'All data' model to the 'PFT specific' model, the increase in the degree of encapsulation of the training set into the prediction set leads to a decrease in the R-squared. Accuracy pre-evaluation may be necessary before applying models trained from either all data or subset data.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.