Abstract Accurate prediction of snow water equivalent (SWE) can be valuable for water resource managers. Recently, deep learning methods such as long short-term memory (LSTM) have exhibited high accuracy in simulating hydrologic variables and can integrate lagged observations to improve prediction, but their benefits were not clear for SWE simulations. Here we tested an LSTM network with data integration (DI) for SWE in the western United States to integrate 30-day-lagged or 7-day-lagged observations of either SWE or satellite-observed snow cover fraction (SCF) to improve future predictions. SCF proved beneficial only for shallow-snow sites during snowmelt, while lagged SWE integration significantly improved prediction accuracy for both shallow- and deep-snow sites. The median Nash–Sutcliffe model efficiency coefficient (NSE) in temporal testing improved from 0.92 to 0.97 with 30-day-lagged SWE integration, and root-mean-square error (RMSE) and the difference between estimated and observed peak SWE values dmax were reduced by 41% and 57%, respectively. DI effectively mitigated accumulated model and forcing errors that would otherwise be persistent. Moreover, by applying DI to different observations (30-day-lagged, 7-day-lagged), we revealed the spatial distribution of errors with different persistent lengths. For example, integrating 30-day-lagged SWE was ineffective for ephemeral snow sites in the southwestern United States, but significantly reduced monthly-scale biases for regions with stable seasonal snowpack such as high-elevation sites in California. These biases are likely attributable to large interannual variability in snowfall or site-specific snow redistribution patterns that can accumulate to impactful levels over time for nonephemeral sites. These results set up benchmark levels and provide guidance for future model improvement strategies.