Abstract

The attention mechanism based on Convolutional Neural Networks (CNNs) adaptively recalibrates the feature distribution of processed objects by modeling the corresponding attention masks. These attention masks can usually be loaded into the feature vector of each layer in a certain multiplication manner, representing different feature responses. However, these attention masks are often developed independently on spatial module or channel module and less connected with each other, resulting in relatively single feature activation and localization. To this end, we present a Dual Fusion Attention (DFA) module that can tune the distribution of feature by producing an attention mask which relying on dual fusion of spatial location and channel information. So that every corresponding feature representation can adaptively enrich its discriminative regions and minimize the influence of background noise. The attention masks are only calculated by the compressed combination of the spatial and channel descriptors, thus the realization of DFA module is lightweight enough with tiny extra computational complexity and parameters. Integrated with modern CNN models, image classification experiments demonstrate the superiority of DFA module. Specifically, based on ResNet50, our method achieves 1.16% improvement on the CIFAR100 benchmark and 0.93% improvement on the ImageNet-200 benchmark.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call