Electricity load prediction is the primary basis on which power-related departments to make logical and effective generation plans and scientific scheduling plans for the most effective power utilization. The perpetual evolution of deep learning has recommended advanced and innovative concepts for short-term load prediction. Taking into consideration the time and nonlinear characteristics of power system load data and further considering the impact of historical and future information on the current state, this paper proposes a Seq2seq short-term load prediction model based on a long short-term memory network (LSTM). Firstly, the periodic fluctuation characteristics of users’ load data are analyzed, establishing a correlation of the load data so as to determine the model’s order in the time series. Secondly, the specifications of the Seq2seq model are given preference and a coalescence of the Residual mechanism (Residual) and the two Attention mechanisms (Attention) is developed. Then, comparing the predictive performance of the model under different types of Attention mechanism, this paper finally adopts the Seq2seq short-term load prediction model of Residual LSTM and the Bahdanau Attention mechanism. Eventually, the prediction model obtains better results when merging the actual power system load data of a certain place. In order to validate the developed model, the Seq2seq was compared with recurrent neural network (RNN), LSTM, and gated recurrent unit (GRU) algorithms. Last but not least, the performance indices were calculated. when training and testing the model with power system load data, it was noted that the root mean square error (RMSE) of Seq2seq was decreased by 6.61%, 16.95%, and 7.80% compared with RNN, LSTM, and GRU, respectively. In addition, a supplementary case study was carried out using data for a small power system considering different weather conditions and user behaviors in order to confirm the applicability and stability of the proposed model. The Seq2seq model for short-term load prediction can be reported to demonstrate superiority in all areas, exhibiting better prediction and stable performance.