Deep learning models provide a more powerful method for accurate and stable prediction of water quality in rivers, which is crucial for the intelligent management and control of the water environment. To increase the accuracy of predicting the water quality parameters and learn more about the impact of complex spatial information based on deep learning models, this study proposes two ensemble models TNX (with temporal attention) and STNX (with spatio-temporal attention) based on seasonal and trend decomposition (STL) method to predict water quality using geo-sensory time series data. Dissolved oxygen, total phosphorus, and ammonia nitrogen were predicted in short-step (1 h, and 2 h) and long-step (12 h, and 24 h) with seven water quality monitoring sites in a river. The ensemble model TNX improved the performance by 2.1%–6.1% and 4.3%–22.0% relative to the best baseline deep learning model for the short-step and long-step water quality prediction, and it can capture the variation pattern of water quality parameters by only predicting the trend component of raw data after STL decomposition. The STNX model, with spatio-temporal attention, obtained 0.5%–2.4% and 2.3%–5.7% higher performance compared to the TNX model for the short-step and long-step water quality prediction, and such improvement was more effective in mitigating the prediction shift patterns of long-step prediction. Moreover, the model interpretation results consistently demonstrated positive relationship patterns across all monitoring sites. However, the significance of seven specific monitoring sites diminished as the distance between the predicted and input monitoring sites increased. This study provides an ensemble modeling approach based on STL decomposition for improving short-step and long-step prediction of river water quality parameter, and understands the impact of complex spatial information on deep learning model.