Abstract

Fine-grained image classification is a challenging problem because of its large intra-class differences and low inter-class variance. Bilinear pooling based models have been shown to be effective at fine-grained classification, while most previous approaches neglect the fact that distinctive features or modeling distinguishing regions usually have an important role in solving the fine-grained problem. In this paper, we propose a novel convolutional neural network framework, i.e., attention bilinear pooling, for fine-grained classification with attention. This framework can learn the distinctive feature information from the channel or spatial attention. Specifically, the channel and spatial attention allows the network to better focus on where the key targets are in the image. This paper embeds spatial attention and channel attention in the underlying network architecture to better represent image features. To further explore the differences between channels and spatial attention, we propose channel attention bilinear pooling (CAB), spatial attention bilinear pooling (SAB), channel spatial attention bilinear pooling (CSAB), and spatial channel attention bilinear pooling (SCAB) as four alternative frames. A variety of experiments on several datasets show that our proposed method has a very impressive performance compared to other methods based on bilinear pooling.

Highlights

  • As an important branch of artificial intelligence, computer vision deals with how computers can be made to gain a high-level understanding from digital images or videos, so as to complete object recognition [1,2,3], detection [4,5], classification [6,7], and other vision-related tasks

  • At theAt same explored the channel of the themodel, model,which which is more useful for classification the classification the time, samewe time, we explored the attention, spatial attention, different channel spatial attention, and spatial attention double bilinear channel attention, spatial attention, different channel spatial attention, and spatial attention double pooling study theto difference between channelbetween and spatial detection classification results

  • VGG-16 is often used as the primary model for fine-grained image classification because of its powerful generalization ability, so we focused on conv5_3 in VGG-16 with the channel attention module, spatial attention module, and double attention module respectively

Read more

Summary

Introduction

As an important branch of artificial intelligence, computer vision deals with how computers can be made to gain a high-level understanding from digital images or videos, so as to complete object recognition [1,2,3], detection [4,5], classification [6,7], and other vision-related tasks. The classification of coarse-grained images differs greatly from each other, and there is no obvious subordinate relationship between the categories and it is easy to distinguish the different categories, the gap between fine-grained image classes is small, and the classification categories generally belong to different sub-categories under the same parent class. Different from the coarse-grained classification, fine-grained image classification is more difficult for the following reasons. High intra-class variance exists due to uncertain factors such as attitude, illumination, occlusion, background interference

Methods
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call