Abstract

In the computer vision community, the general trend has been to capture and select discriminative features in order to yield significantly better performance. Recent advances in attention mechanism proposed several attention blocks to adaptively recalibrate the feature response. However, most of them overlooked the context information at a multi-scale level. In this paper, we propose a simple yet effective building block for ResNeXt-style backbones, namely discriminative local representation (DLR) module, which allows discriminative local representation learning for multi-scale feature information across multi-parallel branches. Our DLR module contains two sub-modules: channel selective module (CSM) and spatial selective module (SSM). Given an intermediate feature map, the CSM first selectively generates the channel-wise attention maps and recalibrates the response from different branches according to the weight vector calculated by softmax layer. And then, the SSM further captures the spatial discriminative information at different scales respectively and emphasizes the interdependent channel maps. Besides, we place a high-order item during the process of multi-branch fusion and residual connection to enhance the intensity of structure nonlinearity. Various DLR modules can be stacked to a deep convolution network named DLRNet. To validate our DLRNet, we conduct comprehensive experiments on classification benchmarks (i.e. CIFAR10, CIFAR100 and ImageNet-1K), as well as two publicly available fine-grained datasets (i.e. CUB-200-2011 and Stanford Dogs). The experiments show consistent improvement gains over previous baseline models with reasonable overhead, and demonstrate the capability of our proposed method for discriminative local representation.

Highlights

  • Learning a coarse-to-fine feature representation is essential for vision tasks such as image classification which recognizes the class label for an input sample

  • Based on this observation, [12] proposed a multi-attention convolutional neural network (MA-CNN) to classify an image by each individual part which generated by clustering, weighting and pooling from spatially-correlated channels

  • DLRNET ARCHITECTURE we describe the details of discriminative local representation (DLR) module

Read more

Summary

INTRODUCTION

Learning a coarse-to-fine feature representation is essential for vision tasks such as image classification which recognizes the class label for an input sample. To maintain as many useful information as possible from a given image, we take inspiration and guidance from [6], [11], [12] and further observe that discriminative local representation can be complementary to multi-scale feature learning and can reinforce each other Analogous line of works noted that the discriminative feature representation can boost accurate attention localization, and vice versa Based on this observation, [12] proposed a multi-attention convolutional neural network (MA-CNN) to classify an image by each individual part which generated by clustering, weighting and pooling from spatially-correlated channels.

CHANNEL SELECTIVE MODULE
EXPERIMENT
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.