Abstract

Weakly supervised object classification and localization are learned object classes and locations using only image-level labels, as opposed to bounding box annotations. Conventional deep convolutional neural network (CNN)-based methods activate the most discriminate part of an object in feature maps and then attempt to expand feature activation to the whole object, which leads to deteriorating the classification performance. In addition, those methods only use the most semantic information in the last feature map, while ignoring the role of shallow features. So, it remains a challenge to enhance classification and localization performance with a single frame. In this article, we propose a novel hybrid network, namely deep and broad hybrid network (DB-HybridNet), which combines deep CNNs with a broad learning network to learn discriminative and complementary features from different layers, and then integrates multilevel features (i.e., high-level semantic features and low-level edge features) in a global feature augmentation module. Importantly, we exploit different combinations of deep features and broad learning layers in DB-HybridNet and design an iterative training algorithm based on gradient descent to ensure the hybrid network work in an end-to-end framework. Through extensive experiments on caltech-UCSD birds (CUB)-200 and imagenet large scale visual recognition challenge (ILSVRC) 2016 datasets, we achieve state-of-the-art classification and localization performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call