Abstract

Supervised object detection models require fully annotated data for training the network. However, labeling large datasets is a very time-consuming task, therefore, weakly supervised object detection (WSOD) is a substitute approach to fully supervised learning for the object detection task. Many methods have been proposed for WSOD to date, their performance is still lower than supervised approaches since WSOD is a very challenging task. The major problem with existing WSOD methods is partial object detection and false detection in an objects cluster with the same category. The majority of the methods on WSOD follow multiple instance learning approaches, which does not guarantee the completeness of detected objects. To address these issues, we propose a three-fold refinement strategy to proposals to learn complete instances. We generate class-specific localization maps by fused class activation maps obtained from fused complementary classification networks. These localization maps are used to amend the detected proposals from the instance classification branch (detection network). Deep reinforcement learning networks are proposed to learn decisive-agent and rectifying-agent based on policy gradient algorithm to further refine the proposals. The refined bounding boxes are then fed to instance classification network. The refinement operations result in learning complete objects and greatly improve detection performance. Experimental results show better detection performance by the proposed WSOD method compared to the state-of-the-art methods on PASCAL VOC2007 and VOC2012 benchmarks.

Highlights

  • Supervised object detection (WSOD) has acquired enormous attention in the literature due to its great ease of demanding only image-level annotated data for training object detector

  • We present a robust proposal refinement module (PRM) to rectify the proposals to be learned by object detector network by retraining through the instance-level supervisions generated by PROPOSAL REFINEMENT MODULE (PRM)

  • BENCHMARK DATA The proposed method is evaluated on PASCAL VOC2007 and VOC2012 datasets with 20 object categories which are widely used as benchmarks for object detection

Read more

Summary

Introduction

Supervised object detection (WSOD) has acquired enormous attention in the literature due to its great ease of demanding only image-level annotated data for training object detector. This has been made possible by the development of convolutional neural networks (CNNs) [1] and large-scale datasets [2] with at least image-level annotations. MIL has certain constraints, such as positive bags contains at least one positive instance and the negative bag contains all negative instances. Another major drawback of MIL is the most likely positives are predicted using existing classifier which can result in faulty learning in case of false-positive predictions, as classifier explicitly cannot deduct true positives in the given image [7]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call