Fine-grained Image Recognition Based on Attention Map and Image Sampling

Jingda Ma,Jinsheng Ji,Mingkang Xiong,Huilin Xiong

doi:10.1109/ihmsc49165.2020.10133

Abstract

The difference between fine-grained image classification and general image classification is that the difference between fine-grained images is small, so in fine-grained image classification, the details of the images are extremely important. In this paper, we proposed a network that can retain both the overall image information and the local image information. Our network structure is composed as follows: First using the convolutional layer to obtain the feature map of the image, and then use the trilinear attention method to process the feature map to obtain the average attention map and the single-channel attention map, and then using selective sampling. The sampled image is obtained according to the two attention maps above, and finally the original image and the sampled two images are input to the convolutional neural network for discrimination. Our entire network can be trained end-to-end. We used this network structure to conduct a large number of experiments on the CUB-2011-200, FGVC aircraft and Stanford Cars datasets, and the experimental results all proved the effectiveness of the method.

Full Text