The hot spot temperature of transformer windings is an important indicator for measuring insulation performance, and its accurate inversion is crucial to ensure the timely and accurate fault prediction of transformers. However, existing studies mostly directly input obtained experimental or operational data into networks to construct data-driven models, without considering the lag between temperatures, which may lead to the insufficient accuracy of the inversion model. In this paper, a method for inverting the hot spot temperature of transformer windings based on the SA-GRU model is proposed. Firstly, temperature rise experiments are designed to collect the temperatures of the entire side and top of the transformer tank, top oil temperature, ambient temperature, the cooling inlet and outlet temperatures, and winding hot spot temperature. Secondly, experimental data are integrated, considering the lag of the data, to obtain candidate input feature parameters. Then, a feature selection algorithm based on mutual information (MI) is used to analyze the correlation of the data and construct the optimal feature subset to ensure the maximum information gain. Finally, Self-Attention (SA) is applied to optimize the Gate Recurrent Unit (GRU) network, establishing the GRU-SA model to perceive the potential patterns between output feature parameters and input feature parameters, achieving the precise inversion of the hot spot temperature of the transformer windings. The experimental results show that considering the lag of the data can more accurately invert the hot spot temperature of the windings. The inversion method proposed in this paper can reduce redundant input features, lower the complexity of the model, accurately invert the changing trend of the hot spot temperature, and achieve higher inversion accuracy than other classical models, thereby obtaining better inversion results.