Short-term load forecasting is an important prerequisite for smart grid controls. The current methods are mainly based on the convolution neural network (CNN) or long short-term memory (LSTM) model to realize load forecasting. For the multi-factor input sequence, the existing methods cannot obtain multi-scale features of the time series and the important parameters of the multi-factor, resulting in low accuracy and robustness. To address these problems, a multi-scale feature attention hybrid network is proposed, which uses LSTM to extract the time correlation of the sequence and multi-scale CNN to automatically extract the multi-scale feature of the load. This work realizes the integration of features by constructing a circular network. In the proposed model, a two-branch attention mechanism is further constructed to capture the important parameters of different influencing factors to improve the model’s robustness, which can make the network to obtain effective features at the curve changes. Comparative experiments on two open test sets show that the proposed multi-scale feature attention mixture network can achieve accurate short-term load forecasting and is superior to the existing methods.