Water level is an important indicator of lake hydrology characteristics, and its fluctuation significantly affects lake ecosystems. In recent years, deep learning models have shown their superiority in the long-time range prediction of hydrology processes, while the application of deep learning models with the attention mechanism for lake water level prediction is very rare. In this paper, taking Poyang Lake as a case study, the transformer neural network model is applied to examine the model performance in lake water level prediction, to explore the effects of the Yangtze River on lake water level fluctuations, and to analyze the influence of hyper-parameters (window size and model layers) and lead time on the model accuracy. The result indicated that the transformer model performs well in simulating the lake water level variations and can reflect the temporal water level variation characteristics in Poyang Lake. In the testing stage, the RMSE values were recorded in the range of 0.26–0.70 m, and the NSE values are higher than 0.94. Moreover, the Yangtze River inflow has a great influence on the lake water level fluctuation of Poyang Lake, especially in flood and receding periods. The contribution rate of the Yangtze River in RMSE and NSE is higher than 80% and 270%, respectively. Additionally, hyper-parameters, such as window size and model layers, significantly influence the transformer model simulation accuracy. In this study, a window size of 90 d and a model layer of 6 are the most suitable hyper-parameters for water level prediction in Poyang Lake. Additionally, lead time may affect the model accuracy in lake water level prediction. With the lead time varied from one to seven days, the model accuracy was high and RMSE values were in the range of 0.46–0.73 m, while the RMSE value increased to 1.37 m and 1.82 m with the lead time of 15 and 30 days, respectively. The transformer neural network model constructed in this paper was the first to be applied to lake water forecasting and showed high efficiency in Poyang Lake. However, few studies have tried to use transformer model coupling with the attention mechanism for forecasting hydrological processes. It is suggested that the model can be used for long sequence time-series forecasting in hydrological processes in other lakes to test its performance, providing further scientific evidence for the control of lake floods and management of lake resources.