The establishment of an accurate and reliable predictive model is essential for water resources planning and management. Standalone models, such as physics-based hydrological models or data-driven hydrological models, have their specific applications, strengths, and limitations. In this study, a hybrid model (namely SWAT-Transformer) was developed by coupling the physics-based Soil and Water Assessment Tool (SWAT) with the data-driven Transformer to enhance monthly streamflow prediction accuracy. SWAT is first constructed and calibrated, and then its outputs are used as part of the inputs to Transformer. By correcting the prediction errors of SWAT using Transformer, the two models are effectively coupled. Monthly runoff data at Yan’an and Ganguyi stations on Yan River, a first-order tributary of the Yellow River Basin, were used to evaluate the proposed model’s performance. The results indicated that SWAT performed well in predicting high flows but poorly in low flows. In contrast, Transformer was able to capture low-flow period information more accurately and outperformed SWAT overall. SWAT-Transformer could correct the errors of SWAT predictions and overcome the limitations of a single model. By integrating SWAT’s detailed physical process portrayal with Transformer’s powerful time-series analysis, the coupled model significantly improved streamflow prediction accuracy. The proposed models offer more accurate and reliable predictions for optimal water resource management, which is crucial for sustainable economic and societal development.