Abstract

Weakly supervised object localization (WSOL) has attracted intense interest in computer vision for instance level annotations. As a hot research topic, a number of existing works concentrated on utilizing convolutional neural network (CNN)-based methods, which are powerful in extracting and representing features. The main challenge in CNN-based WSOL methods is to obtain features covering the entire target objects, not only the most discriminative object parts. To overcome this challenge and to improve the detection performance of feature extracting related WSOL methods, a CNN-based two-branch model was presented in this paper to locate objects using supervised learning. Our method contained two branches, including a detection branch and a self-attention branch. During the training process, the two branches interacted with each other by regarding the segmentation mask from the other branch as the pseudo ground truth labels of itself. Our model was able to focus on capturing the information of all the object parts due to the self-attention mechanism. Additionally, we embedded multi-scale detection into our two-branch method to output two-scale features. We evaluated our two-branch network on the CUB-200-2011 and VOC2007 datasets. The pointing localization, intersection over union (IoU) localization, and correct localization precision (CorLoc) results demonstrated competitive performance with other state-of-the-art methods in WSOL.

Highlights

  • Object detection is a fundamental task in computer vision

  • We presented a two-branch network for weakly supervised object localization (WSOL) and a self-attention mechanism was embedded to improve the ability of feature expression by connecting object parts

  • We evaluated the performance of the two-branch network on the VOC2007 dataset

Read more

Summary

Introduction

Object detection is a fundamental task in computer vision. It has been widely applied in the field of autonomous driving system and intelligent security system. The training process of object detection methods requires a lot of instance level annotations, which is time consuming and labor intensive. Weakly supervised object localization (WSOL) performs the detection process with only labels for classification (image level labels), and no related bounding box labels (instance level labels) are provided. The model is taught to classify objects based on the given image level labels. With the features learned in the classification process, the model is asked to give the bounding box prediction results of target objects

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.