Abstract

Weakly-supervised object detection (WSOD) has attracted lots of attention in recent years. However, there is still a big gap between WSOD and generic object detection. The main barriers to the efficiency of WSOD are the ineffective data augmentations and inaccurate bounding box predictions. Given only image-level annotations, it is hard for WSOD to effectively utilize variant data augmentations and accurately regress the bounding boxes. Although a fully-supervised object detector can be trained using annotations generated from the weakly-supervised object detector, the performance is still severely limited due to the low quality of mined pseudo annotations. This paper proposes an efficient WSOD method with pseudo annotations (EWPA) to make better use of imperfect annotations. With the assistance of pseudo annotations, EWPA can effectively regress more accurate bounding boxes while the traditional WSOD can only locate the salient parts of an object. Furthermore, pseudo annotations can help design more complex data augmentations, driving the network to learn more discriminative feature representations. Extensive experiments are conducted on PASCAL VOC 2007 and 2012 datasets and validate the effectiveness of EWPA.

Highlights

  • I N recent years, the development of Convolutional Neural Networks (CNN) has significantly boosted the performance of many tasks in computer vision such as image classification [1]–[3], object detection [4]–[6] and semantic segmentation [7]–[9]

  • The training of a fully supervised detector relies heavily on such datasets with precise instance-level annotations, which are always a big cost of human labor. To address this problem, we are devoted to Weakly Supervised Object Detection (WSOD) problem, which only needs image-level annotations for training and saves massive cost of labeling training data

  • We focus on the better use of pseudo annotations and fully explore the variant data augmentations and corresponding box regression

Read more

Summary

INTRODUCTION

I N recent years, the development of Convolutional Neural Networks (CNN) has significantly boosted the performance of many tasks in computer vision such as image classification [1]–[3], object detection [4]–[6] and semantic segmentation [7]–[9]. TRADITIONAL MULTIPLE INSTANCE LEARNING Due to the absence of instance-level annotations, most of previous methods formulate the weakly-supervised object detection as a Multiple Instance Learning (MIL) problem [17] These approaches consider each image as a bag of candidate proposals and all images are labeled as a positive or negative sample of a specific class. Since pseudo ground truths are needed to be generated through weakly-supervised instance segmentation results when training the regression branch, they can be utilized in all training process. For a newly-added online instance classifier, the instance-level class labels used for training are generated based on pseudo ground truths (called proposal clusters in [16]) obtained through classification scores of the previous classifier. It will be removed if the maximum of IoUs is larger than 0.5

REGRESSION BRANCH
EXPERIMENTS
DATASETS AND EVALUATION METRICS
IMPLEMENTATION DETAILS
Findings
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.