Abstract

A broad range of prior research has demonstrated that attention mechanisms offer great potential in advancing the performance of deep convolutional neural networks (CNNs). However, most existing approaches either ignore modeling attention in both channel and spatial dimensions or introduce higher model complexity and heavier computational burden. To alleviate this dilemma, in this paper, we propose a lightweight and efficient multidimensional collaborative attention, MCA, a novel method for simultaneously inferring attention in channel, height, and width dimensions with almost free additional overhead by using a three-branch architecture. For the essential components of MCA, we not only develop an adaptive combination mechanism for merging dual cross-dimension feature responses in squeeze transformation, enhancing the informativeness and discriminability of feature descriptors but also design a gating mechanism in excitation transformation that adaptively determines the coverage of interaction to capture local feature interactions, overcoming the paradox of performance and computational overhead trade-off. Our MCA is simple yet general and can be easily plugged into various classic CNNs as a plug-and-play module and trained along with the vanilla networks in an end-to-end manner. Extensive experimental results for image recognition on CIFAR and ImageNet-1K datasets demonstrate the superiority of our method over other state-of-the-art (SOTA) counterparts. In addition, we also provide insight into the practical benefits of MCA by visually inspecting the GradCAM++ visualization results. The code is available at https://github.com/ndsclark/MCANet.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call