Abstract

Attention mechanisms are rapidly being noticed in the computer vision community. However, existing works focus on the design of a single attention module, and then employ the same attention module in all attention layers of an entire network, rendering sub-optimal performance. In this paper, we address a learning-to-attention problem by proposing Switchable Attention (SA), which learns to select different kinds of attention modules for variant blocks of a Deep Neural Network (DNN). SA employs three distinct scopes to compute the attention map, including local spatial attention (LSA), global spatial attention (GSA) and channel attention (CA). Especially, the introduced trainable parameters can be optimized with the network in an end-to-end manner. SA is a light-weight module which can be embedded in existing networks with little overhead of computation. We conduct several quantitative experiments, and SA boosts the performance of the baseline on various challenging benchmarks, such as CIFAR-100, ImageNet-1K, and MS COCO and different computer vision tasks, such as Image Classification and Object detection. Notably, SA outperforms the most attention methods with ResNet-50 on ImageNet-1K.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call