In high-voltage transmission lines, the tension tower needs to withstand the tension load of the overhead power line for a long time, which is prone to damage, and it is an important part of the inspection in the circuit inspection. In the modern circuit inspection process, operation and maintenance personnel mostly use unmanned aerial vehicles (UAVs) to photograph various parts of the tension tower, obtain inspection images, and manually classify and name the massive inspection images, which is low in accuracy and efficiency. Based on the above problems, this paper collects a large number of real-life UAV inspection images of various parts of a tension tower, and proposes the Stower-13 inspection image dataset, which is used to train the classification model to achieve automatic classification and naming of inspection images. Based on this dataset, this paper also proposes an improved MobileViT model, in which the Scale-Correlated Pyramid Convolution Block Attention Block (SCPCbam) module is introduced, which adds the Convolution Block Attention Module (CBAM) to the four branches of the original Scale-Correlated Pyramid Convolution (SCPC) module, so as to strengthen the ability of image multi-scale information extraction and improve the classification accuracy. This paper discusses a number of experiments on the model, and the experimental results show that the dataset proposed in this paper helps the model to understand the feature information. At the same time, the improved MobileViT model has a strong ability to extract image spatial feature information, the classification accuracy is higher than that of other models of the same type, so it is able to cope with a wide range of problems that arise in the course of practice, and it meets the practical needs of automatically naming transmission line inspection images.