In the production of strip steel, surface defect identification is crucial for improving product quality and ensuring smooth subsequent processes. Existing technologies face challenges such as low detection efficiency and susceptibility to environmental noise. This article employs an automated deep learning method without requiring consideration of complex environmental changes and proposes an improved RepVGG (ViT‐RepVGG) model for surface defect identification. The model is based on the RepVGG architecture, and the study explores the impact of incorporating the self‐attention mechanism of ViT under various addition strategies on model performance. A comparison is made between the optimized model and classic network models, as well as recently published models, in terms of identification performance. The research also examines the performance variations of the model under different hyperparameter settings and its identification performance for six types of defects. The results indicate that adding the ViT module to stage 3 of the A1‐type RepVGG, with a learning rate, optimizer, and activation function set to 0.0001, Adam, and Gelu, respectively, yields the optimal ViT‐RepVGG model performance. These findings demonstrate the feasibility of enhancing classification performance by incorporating the self‐attention mechanism into neural networks, providing an effective foundation for the online identification of strip steel surface defects.