In this paper, we address the critical task of 24-h streamflow forecasting using advanced deep-learning models, with a primary focus on the transformer architecture which has seen limited application in this specific task. We compare the performance of five different models, including persistence, long short-term memory (LSTM), Seq2Seq, GRU, and transformer, across four distinct regions. The evaluation is based on three performance metrics: Nash-Sutcliffe Efficiency (NSE), Pearson's r, and normalized root mean square error (NRMSE). Additionally, we investigate the impact of two data extension methods: zero-padding and persistence, on the model's predictive capabilities. Our findings highlight the transformer's superiority in capturing complex temporal dependencies and patterns in the streamflow data, outperforming all other models in terms of both accuracy and reliability. Specifically, the transformer model demonstrated a substantial improvement in NSE scores by up to 20% compared to other models. The study's insights emphasize the significance of leveraging advanced deep learning techniques, such as the transformer, in hydrological modeling and streamflow forecasting for effective water resource management and flood prediction.