Abstract

In order to improve the accuracy and processing speed of object detection in weakly supervised learning environment, a weakly supervised real-time object detection method based on saliency map extraction and improved YOLOv5 is proposed. For the case where only image-level annotations are available, class-specific saliency maps are generated from the backpropagation process using a VGG-16-based classification network. After obtaining the position information of the target in the image, the pseudobounding box of the target is generated, and the pseudobounding box is used as the ground-truth bounding box to optimize the real-time target detection network. An improved YOLOv5 model is proposed to transfer clear target features to deeper network layers by designing a jump connection operation, thereby solving the problem of feature ambiguity. At the same time, the convolutional attention mechanism module is introduced to solve the problem that the recognition accuracy is affected by invalid features. Experiments on the PASCAL VOC 2007+2012 datasets show that when only image-level annotations are available in the training data, the proposed method can effectively improve the processing speed and maintain a good target detection accuracy, realizing real-time object detection under weakly supervised conditions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call