Abstract

Weakly supervised object detection (WSOD) aims to train object detectors by using only image-level annotations. Many recent works on WSOD adopt multiple instance detection networks (MIDN), which usually generate a certain number of proposals and regard proposal classification as a latent model learning within image classification. However, these methods tend to detect salient object, salient object parts and clustered objects due to lack of instance-level annotations during training. Thus a core issue is how to guarantee that the network learn as many objects with precise bounding boxes as possible. In this paper, we address this issue by exploiting the potential of proposal scores during training. We propose an adaptive instance refinement (AIR) framework with three novel designs, which can be integrated with MIDN into a single network. Specifically, adaptive instance mining attempts to discover all positive instances according to the score distribution of proposals and their spatial similarity. Adaptive score modulation dynamically adjusts proposal scores to make the network focus more on instances with different difficulties in different training iterations. Adaptive knowledge refinement distills important information from all previous stages by the weighted average of proposal scores. The experimental results on the PASCAL VOC 2007 and 2012 benchmarks and the MS COCO benchmark demonstrate that AIR significantly improves the performance of the original MIDN and achieves the state-of-the-art results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call