Abstract

Abstract The National Oceanic and Atmospheric Administration has developed a very high-resolution streamflow forecast using National Water Model (NWM) for 2.7 million stream locations in the United States. However, considerable challenges exist for quantifying uncertainty at ungauged locations and forecast reliability. A data science approach is presented to address the challenge. The long-range daily streamflow forecasts are analyzed from December 2018 to August 2021 for Alabama and Georgia. The forecast is evaluated at 389 observed USGS stream gauging locations using standard deterministic metrics. Next, the forecast errors are grouped using watersheds’ biophysical characteristics, including drainage area, land use, soil type, and topographic index. The NWM forecasts are more skillful for larger and forested watersheds than smaller and urban watersheds. The NWM forecast considerably overestimates the streamflow in the urban watersheds. The classification and regression tree analysis confirm the dependency of the forecast errors on the biophysical characteristics. A densely connected neural network model consisting of six layers [deep learning (DL)] is developed using biophysical characteristics, NWM forecast as inputs, and the forecast errors as outputs. The DL model successfully learns location invariant transferrable knowledge from the domain trained in the gauged locations and applies the learned model to estimate forecast errors at the ungauged locations. A temporal and spatial split of the gauged data shows that the probability of capturing the observations in the forecast range improved significantly in the hybrid NWM-DL model (82% ± 3%) than in the NWM-only forecast (21% ± 1%). A trade-off between overly constrained NWM forecast and increased forecast uncertainty range in the DL model is noted. Significance Statement A hybrid biophysical–artificial intelligence (physics–AI) model is developed from the first principle to estimate streamflow forecast errors at ungauged locations, improving the forecast’s reliability. The first principle refers to identifying the need for the hybrid physics–AI model, determining physically interpretable and machine identifiable model inputs, followed by the deep learning (DL) model development and its evaluations, and finally, a biophysical interpretation of the hybrid model. A very high-resolution National Water Model (NWM) forecast, developed by the National Oceanic and Atmospheric Administration, serves as the biophysical component of the hybrid model. Out of 2.7 million daily forecasts, less than 1% of the forecasts can be verified using the traditional hydrological method of comparing the forecast with the observations, motivating the need for the AI technique to improve forecast reliability at millions of ungauged locations. An exploratory analysis followed by the classification and regression tree analysis successfully determines the dependency of the forecast errors on the biophysical attributes, which along with the NWM forecast, are used for the DL model development. The hybrid model is evaluated in a subtropical humid climate of Alabama and Georgia in the United States. Long-term streamflow forecasts from zero-day lead to 30-day lead forecasts are archived and analyzed for 979 days (December 2018–August 2021) and 389 USGS gauging stations. The forecast reliability is assessed as the probability of capturing the observations in its ensemble range. As a result, the forecast reliability increased from 21% (±1%) in the NWM only forecasts to 82% (±3%) in the hybrid physics–AI model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call