Abstract

The difference between fine-grained image classification and general image classification is that the difference between fine-grained images is small, so in fine-grained image classification, the details of the images are extremely important. In this paper, we proposed a network that can retain both the overall image information and the local image information. Our network structure is composed as follows: First using the convolutional layer to obtain the feature map of the image, and then use the trilinear attention method to process the feature map to obtain the average attention map and the single-channel attention map, and then using selective sampling. The sampled image is obtained according to the two attention maps above, and finally the original image and the sampled two images are input to the convolutional neural network for discrimination. Our entire network can be trained end-to-end. We used this network structure to conduct a large number of experiments on the CUB-2011-200, FGVC aircraft and Stanford Cars datasets, and the experimental results all proved the effectiveness of the method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call