Abstract

In the context of 3C assembly scenarios, characterized by numerous semi-flexible, heterogeneous, and small slender targets, traditional target detection algorithms face significant challenges such as low accuracy, weak generalization, large model sizes, and slow inference speeds. To address these issues, this study introduces an enhanced method based on the YOLOv5 model, named YOLOv5-GTB. This method integrates Bidirectional Feature Pyramid Network (BiFPN), Ghost lightweight convolution, Vision Transformer (ViT), and adaptive activation function technologies, aiming to improve the accuracy and speed of target detection. Our approach utilizes Ghost convolution to construct a Ghost bottleneck layer, optimizing the feature extraction network while significantly reducing computational costs and enhancing the convolutional neural network’s feature extraction capabilities. Additionally, the Cross-Stage Partial Network (CSPNet) architecture is employed to effectively segment the data flow of the input feature map layer, thereby improving the efficiency of gradient processing. Furthermore, we introduce a fusion structure of CNN and Transformer, leveraging the strengths of convolutional neural networks in local feature extraction and the ViT’s capability in long-range information capture, thereby further enhancing the overall network’s feature extraction performance. Regarding feature fusion, considering the limitations of traditional top-down unidirectional information flow in effectively merging features with both location and semantic information, BiFPN is incorporated into the YOLOv5-GTB. This enhances the fusion of feature layers extracted from both the end of the backbone network and the first module’s fused feature layer, thus improving detection accuracy. Ablation and comparative experiments conducted on the 3C assembly scenario target detection dataset demonstrate the significant advantages of the YOLOv5-GTB model in terms of accuracy and speed. Ultimately, the application of this model to the 3C assembly platform successfully achieves rapid and accurate target recognition in this scenario.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call