Abstract
Statistical learning seeks to learn statistics-based rules for data analysis tasks from known examples of inputs, or features, and corresponding outcomes and includes machine learning (ML) and deep learning (DL) algorithms. Data sets that are noisy, include significant uncertainty, and have extreme values hinder the learning process. In this study, aquifer recharge predictors are developed using four, random forest or gradient boosting ML methods and Long Short-Term Memory (LSTM) networks, a DL method to: (1) examine predictive skill when trained using noisy and uncertain data and (2) identify advantages of statistical learning implementations for prediction of water budget outcomes relative to process-based water budget calculations. Recharge was selected as the learning outcome because it is not observed and inherently uncertain. Precipitation, potential evapotranspiration (PET), and river discharge are the features, or inputs, and are calculated, or modelled, values and are not directly observed; consequently, they are expected to be noisy and uncertain because of contamination with measurement and model error. A common-sense baseline is developed and implemented to account for uncertainty and noise in outcomes for training and validation; the baseline provides delineation of a lower goodness-of-fit threshold that identifies when trained ML and DL models generate prediction skill and an upper goodness-of-fit threshold above which the models are learning to reproduce noise and bias. For statistical learning regression implementations, features and outcomes need to be transformed to be Gaussian-like. Inherent variability and extreme events in precipitation, discharge, and recharge data sets require power transformation, or at least scaling of logarithms, to enhance predictive skill. Identified advantages to statistical learning of water budget outcomes are the ability to use dimensionless trends for features and to represent a complex study site with the same level of effort as a simple site.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have