Abstract

The difference between fine-grained image classification and general image classification is that the difference between fine-grained images is small, so in fine-grained image classification, the details of the images are extremely important. In this paper, we proposed a network that can retain both the overall image information and the local image information. Our network structure is composed as follows: First using the convolutional layer to obtain the feature map of the image, and then use the trilinear attention method to process the feature map to obtain the average attention map and the single-channel attention map, and then using selective sampling. The sampled image is obtained according to the two attention maps above, and finally the original image and the sampled two images are input to the convolutional neural network for discrimination. Our entire network can be trained end-to-end. We used this network structure to conduct a large number of experiments on the CUB-2011-200, FGVC aircraft and Stanford Cars datasets, and the experimental results all proved the effectiveness of the method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.