In the task of pavement distress recognition and classification, the complexity of the pavement environment, the small proportion of distresses in images, significant variation in distress scales, and the influence of features such as vehicles and traffic signs in the data make distress feature extraction challenging. This paper proposes a spectrum focus transformer (SFT) layer, which processes the signal spectrum and focuses on important frequency components. Initially, by thoroughly analyzing the frequency domain characteristics of image data, frequency value distribution information is obtained to achieve fine-tuning of different frequency components. Subsequently, frequency information and images are learned and weighted in the frequency domain, thereby enhancing the capability to capture pavement distress regions. Experiments conducted on the road pavement distress dataset revealed through heatmap analysis that distress regions received increased attention, achieving an accuracy of 97.73%. This performance demonstrates a higher accuracy compared to other models.
Read full abstract