Distributional Regression Forests Approach to Regional Frequency Analysis With Partial Duration Series

K G Kiran,V V Srinivas

doi:10.1029/2021wr029909

Abstract

AbstractRegional flood frequency analysis (RFFA) is widely used to quantify flood risk at ungauged and sparsely gauged locations. There are minimal attempts to use partial duration series (PDS) for RFFA, though the use of PDS instead of widely used annual maximum series (AMS) can offer some advantages. This article contributes two novel random/regression forests (RFs)‐based methodologies, namely generalized pareto distribution (GPD)‐based distributional RFs (DRFs) and multivariate RFs (MVRFs), for RFFA with PDS. The RFs facilitate modeling interactions between predictors and their complex relationships with the predictands without explicitly specifying them. The DRFs and MVRFs comprise an ensemble of corresponding regression trees, each constructed by recursive binary partitioning of the feature space into meaningful segments. The proposed DRFs account for the sampling uncertainty of PDS in the partitioning and parameter estimation. In DRFs (MVRFs), quantile estimates for an ungauged site are obtained using maximum likelihood estimates (expected values) of GPD parameters corresponding to the segments to which the site belongs. The potential of DRFs and MVRFs relative to two recently proposed techniques (univariate RFs‐based quantile regression, generalized additive model based on GPD) is demonstrated through Monte‐Carlo simulation experiments and a study on 1,031 watersheds in the United States. The key features influencing scale and shape parameters of GPD fitted to PDS of the watersheds are identified as drainage area and 24‐hr rainfall intensity corresponding to 2‐year return period, respectively. Those identified for shape parameter differ from key features known based on analysis with AMS and generalized extreme value distribution.

Full Text