Amidst growing concerns over climate-induced extreme weather events, precise flood forecasting becomes imperative, especially in regions like the Chaersen Basin where data scarcity compounds the challenge. Traditional hydrologic models, while reliable, often fall short in areas with insufficient observational data. This study introduces a hybrid modeling approach that combines the deep learning capabilities of the Informer model with the robust hydrological simulation by the WRF-Hydro model to enhance runoff predictions in such data-sparse regions. Trained initially on the diverse and extensive CAMELS dataset in the United States, the Informer model successfully applied its learned insights to predict runoff in the Chaersen Basin, leveraging transfer learning to bridge data gaps. Concurrently, the WRF-Hydro model, when integrated with The Global Forecast System (GFS) data, provided a basis for comparison and further refinement of flood prediction accuracy. The integration of these models resulted in a significant improvement in prediction precision. The synergy between the Informer’s advanced pattern recognition and the physical modeling strength of the WRF-Hydro significantly enhanced the prediction accuracy. The final predictions for the years 2015 and 2016 demonstrated notable increases in the Nash–Sutcliffe Efficiency (NSE) and the Index of Agreement (IOA) metrics, confirming the effectiveness of the hybrid model in capturing complex hydrological dynamics during runoff predictions. Specifically, in 2015, the NSE improved from 0.5 with WRF-Hydro and 0.63 with the Informer model to 0.66 using the hybrid model, while in 2016, the NSE increased from 0.42 to 0.76. Similarly, the IOA in 2015 rose from 0.83 with WRF-Hydro and 0.84 with the Informer model to 0.87 using the hybrid approach, and in 2016, it increased from 0.78 to 0.92. Further investigation into the respective contributions of the WRF-Hydro and the Informer models revealed that the hybrid model achieved the optimal performance when the contribution of the Informer model was maintained between 60%-80%.