AbstractCurrent machine learning methods for discharge prediction often employ aggregated basin‐wide hydrometeorological data (lumped modeling) for parametric and non‐parametric training. This approach may overlook the spatial heterogeneity of river systems and their impact on discharge patterns. We hypothesize that integrating spatiotemporal hydrologic knowledge into the data modeling process (distributed/disaggregated modeling) can improve the performance of discharge prediction models. To test this hypothesis, we designed experiments comparing the performance of identical Long Short‐Term Memory Recurrent Neural Network (LSTM‐RNN) models forced with either lumped or distributed features. We gather meteorological forcing and static attributes for the Mackenzie basin in Canada‐ a large and unique basin. Importantly, discharge performance is assessed out‐of‐sample with k‐fold replication across gauges. Training LSTMs with disaggregated data significantly improved model accuracy. Specifically, there was a 9.6% increase in the mean Nash‐Sutcliffe Efficiency and a 4.6% increase in the mean Kling‐Gupta Efficiency, indicating a better agreement between predicted and actual observations in terms of mean, variability, and correlation. These experiments and results demonstrate the importance of integrating topologically guided geomorphologic and hydrologic information (distributed modeling) in data‐driven discharge predictions.