IMPUTATION OF CONTIGUOUS GAPS AND EXTREMES OF SUBHOURLY GROUNDWATER TIME SERIES USING RANDOM FORESTS

Dipankar Dwivedi,Kenneth H Williams,Deborah Agarwal,Susan S Hubbard,Utkarsh Mital,Carl I Steefel,Baptiste Dafflon,Boris Faybishenko,Charuleka Varadharajan

doi:10.1615/jmachlearnmodelcomput.2021038774

Dipankar Dwivedi, Kenneth H Williams + Show 7 more

Open Access

https://doi.org/10.1615/jmachlearnmodelcomput.2021038774

Copy DOI

Abstract

Machine learning can provide sustainable solutions to gap-fill groundwater (GW) data needed to adequately constrain watershed models. However, imputing missing extremes is more challenging than other parts of a hydrograph. To impute missing subhourly data, including extremes, within GW time-series data collected at multiple wells in the East River watershed, located in southwestern Colorado, we consider a single-well imputation (SWI) and a multiple-well imputation (MWI) approach. SWI gap-fills missing GW entries in a well using the same well's time-series data; MWI gap-fills a specific well's missing GW entry using the time series of neighboring wells. SWI takes advantage of linear interpolation and random forest (RF) approaches, whereas MWI exploits only the RF approach. We also use an information entropy framework to develop insights into how missing data patterns impact imputation. We discovered that if gaps were at random intervals, SWI could accurately impute up to 90% of missing data over an approximately two-year period. Contiguous gaps constituted more complex scenarios for imputation and required the use of MWI. Information entropy suggested that if gaps were contiguous, up to 50% of missing GW data could be estimated accurately over an approximately two-year period. The RF-feature importance suggested that a time feature (months) and a space feature (neighboring wells) were the most important predictors in the SWI and MWI. We also noted that neither SWI nor MWI methods could capture the missing extremes of a hydrograph. To counter this, we developed a new sequential approach and demonstrated the imputation of missing extremes in a GW time series with high accuracy.

Full Text