Abstract

Training of a convolutional neural network for object detection requires a large number of images with pixel-level annotations. Weakly supervised learning uses image-level labels to circumvent the issue of lack of semantic examples, which remains an open challenging. This paper proposes a cascaded deep network architecture that leverages the class activation mapping with global average pooling. The first stage of this architecture learns to infer object localization maps based on the image-level annotations, which generates bounding boxes of objects in every image. These image patches are adhesion areas in the original image. In the second stage, the image patches are used to train the detection network. Experiments are conducted using the PASCAL VOC 2012 datasets. Our proposed method obtains a mean average precision of 87.2% and demonstrates a competitive performance of classification performance with respect to the state-of-the-art methods. In the evaluation of object localization, the recall of our method is improved by 9%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.