Compared to the surface defect detection of industrial products produced according to specified processes, the detection of surface defects in naturally grown red jujubes poses unique and significant challenges for researchers. The high diversity of surface defects, subtle distinctions from the background, low contrast, varying scales, and the presence of high levels of noise in images are among the factors that greatly amplify the complexity of defect detection tasks. Existing methods show some deficiencies in addressing these issues, mainly due to insufficient feature extraction capabilities and overly complex network structures, leading to limitations in model efficiency and practical application performance. To tackle the challenges associated with red jujube surface defect detection, this study proposes an optimized Tiny Vision Transformer (TinyViT) network structure, named RJ-TinyViT. This method refines the TinyViT-5m network structure to reduce network burden and introduces an improved Multi-Kernel Block (MK Block) and an improved Mobile Inverted Bottleneck Convolution Block (MBConv Block) to enhance feature extraction capabilities. Additionally, we have integrated the Coordinate Attention (CA) module to enhance the model's capacity for recognizing and focusing on features of surface defects on red jujubes. Experimental results show that RJ-TinyViT achieved a classification accuracy of 93.38%, marking an improvement of 1.84% over the original TinyViT network. At the same time, its Floating-point Operations (FLOPs) and Parameters (Params) were reduced to 58.97% and 39.84% of the original TinyViT network, respectively. These results not only demonstrate that RJ-TinyViT achieves model lightweighting while maintaining high accuracy but also highlight its value in practical industrial applications.
Read full abstract