Accurately predicting streamflow and early flood warning are important but also very challenging, due to the complexity and stochastic nature of the runoff process. By describing the process nature, this study proposes a spatio-temporal deep learning model, which integrates multiple sources of information including Hydrology-related data from the Global Land Data Assimilation System (GLDAS), hydro-meteorological and streamflow data, to better capture the complexity of the hydrological processes. The Maximum Information Coefficient (MIC) is utilized to reduce data dimensionality and assess the relationship between streamflow and other variables. The Completely Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) and Variational Mode Decomposition (VMD) methods are combined to extract the most significant features in the predictors. The Time Convolutional Network (TCN) and Gated Recurrent Unit (GRU) are also combined for prediction, enhancing the model's robustness and balancing short-term and long-term forecasting effects. The integration of predictive sub-models with Random Forest (RF) improves overall performance. The framework is applied to 11 hydrological stations in the upper, middle, and lower reaches of the Jialing River mainstream basin in China. According to performance measures, our approach outperforms other baseline models. Flood, probabilistic, and lead time predictions of streamflow make the model more widely applicable and robust. In order to increase the interpretability, the SHAP tool is used in this study to analyze the contribution of each selected influencing variable to the long-term trend of streamflow.