Abstract

This paper presents WALLACE, a new framework of deep convolutional neural networks, which perform ConvNet’s pyramidal feature hierarchy for weakly supervised learning. Most prior works rely on the image pyramid or network ensemble, which is both complicated and usually expensive. Instead, WALLACE is a more simple single-stage network that can predict objects present and location in an image without multiple rescale. Our model is trained efficiently using only global image-level labels, and it could generate meaningful multi-scale semantic feature maps by only one evaluation. Furthermore, a novel constrain-to-highlight loss is proposed to balances region selection among hierarchical feature maps, which additional improve model performance. Extensive experiments on object classification and weakly supervised pointwise object localization show that WALLACE achieves state-of-the-art results on the VOC 2007 and VOC 2012 benchmark without bells and whistles.

Highlights

  • In recent years, Convolutional Neural Networks (CNN) have emerged as the new state-of-the-art learning framework for various visual recognition tasks, e.g., image classification [2]–[5], image segmentation [6], object localization [7], and object detection [1], [8], [9]

  • We present a thorough evaluation of the WALLACE in VOC 2007 and VOC 2012 datasets for image classification and weakly supervised localization

  • We propose a different weakly supervised learning procedure, which is based on the single-shot multi-scale scheme

Read more

Summary

INTRODUCTION

Convolutional Neural Networks (CNN) have emerged as the new state-of-the-art learning framework for various visual recognition tasks, e.g., image classification [2]–[5], image segmentation [6], object localization [7], and object detection [1], [8], [9]. For object detection or localization, regular fully supervised training requires the object location or size annotations which demands a lot of workforces and material resources To reduce these cost of data annotation, some attempts [10]–[15] of Weakly Supervised Learning(WSL) of CNNs come up. In addition to the pyramid multi-scale features layers, we introduce a global multiscale pooling to extract highlight regions from these features layers These pooling compositions are modified from WELDON [11], see concatenate weakly-supervised prediction module part in the model section for details. Considering the entire prediction pipeline is a single network, WALLACE can be optimized end-to-end directly by the classification label; our model can predict class probabilities directly and get the classdepended feature heat map indirectly from full images in one evaluation Based on these feature heat maps, we can implement accurate WSL object localization. We will show that the pyramidal feature hierarchy can make effective use of multiscale evidence in the weakly supervised learning process while image pyramid or network ensemble methods fail

RELATED WORKS AND CONTRIBUTIONS
MULTI-SCALE EVIDENCE INTUITION
TRAINING PHASE
COMPUTATIONAL COMPLEXITY ANALYSIS
CLASSIFICATION EXPERIMENTS
Findings
LOCATION PREDICTION EXPERIMENTS
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.