Subtler mixed attention network on fine-grained image classification

Chao Liu,Lei Huang,Zhiqiang Wei,Wenfeng Zhang

doi:10.1007/s10489-021-02280-y

Abstract

The key of fine-grained image categorization is to locate discriminative regions and feature extraction from these regions correspond to subtle visual traits. Some of the current methods use the attention mechanism to identify the discriminative region, but ignore that there is still a large amount of non-foreground noise information in these regions. In this work, we propose a Subtler Mixed Attention Network (SMA-Net), which contains two modules: 1) Discriminative region location module uses the channel attention mechanism to construct a feature pyramid network to locate the discriminative regions. And use the positive effect of classification to screen a group of the most discriminative regions and learn through rank to learn. 2) Mixed attention module (MAM) of feature extraction that can focus on subtler and differentiated regions. We divide the feature map into intervals according to regions, and learn attention features according to regional orientation. Then the attention maps are multiplied to the input feature map for adaptive features reinforce. At the same time, MAM is a lightweight module that can be easily integrated into advanced networks without increasing too much calculation. We validated our SMA-Net through substantial experiments on Caltech-UCSD Birds (CUB-200-2011), Stanford Cars, CIFAR-10, Fish4Knowledge and Flower17. In particular, the accuracy on two widely used fine-grained datasets, CUB-2011 and Stanford Cars, reached 87.71% and 94.37%, respectively.

Full Text