Abstract

Supervised object detection schemes use fully annotated training data, which is fairly expensive to constitute. Whereas, weakly supervised object detection (WSOD) uses only image-level annotations for training which are much simpler to acquire. WSOD is a challenging task since it aims to learn object localization and detection with image-level labels. In line with this assertion, in this paper, we present an end-to-end framework for WSOD based on discriminative feature learning. We use the objectness technique to get initial proposals from the images. Afterwards, two complementary networks are trained in parallel to obtain discriminative image features, which are channel-wise concatenated with the features of the third network. We name this classification network designed for discriminative feature learning as fused complementary network. This network learns the proposals enclosing whole object instances by complementary features which ultimately learns to predict the high probabilities for whole objects than proposals containing only object parts. Clustering is then hierarchically performed on the region proposals. Our clustering method, named instance clustering, first performs inter-class clustering followed by iterative intra-class clustering using intersection-over-union metric to obtain spatially adjacent cluster members corresponding to each object instance. In each intra-class clustering iteration, the high scoring proposal is set as centroid from each intra-class cluster. Experiments are conducted on PASCAL VOC2007 and PASCAL VOC2012 datasets. Both qualitative and quantitative results have shown improved WSOD performance on these benchmarks.

Highlights

  • In supervised object detection, bounding box annotations are required for training on multi-label images

  • It is the ratio between the intersection and the union of the predicted box and the ground-truth box. mean of AP (mAP) is an average of the AP computed for all the classes for object detection

  • In this paper, we have proposed a Weakly supervised object detection (WSOD) method based on complementary feature learning and instance clustering

Read more

Summary

Introduction

In supervised object detection, bounding box annotations are required for training on multi-label images. Gathering ground-truth bounding boxes for natural images is the major limitation in real-world object detection applications since it is a time-consuming and laborious task [1]. Using weakly supervised learning (WSL) to object detection is an appropriate solution to object annotations problem. Supervised object detection (WSOD) refers to learning object detections with only image-level annotations [2], [3]. The associate editor coordinating the review of this manuscript and approving it for publication was Andrea F.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call