Abstract

Compared with ordinary image classification tasks, fine-grained image classification is closer to real-life scenes. Its key point is how to find the local areas with sufficient discrimination and perform effective feature learning. Based on a bilinear convolutional neural network (B-CNN), this paper designs a local importance representation convolutional neural network (LIR-CNN) model, which can be divided into three parts. Firstly, the super-pixel segmentation convolution method is used for the input layer of the model. It allows the model to receive images of different sizes and fully considers the complex geometric deformation of the images. Then, we replaced the standard convolution of B-CNN with the proposed local importance representation convolution. It can score each local area of the image using learning to distinguish their importance. Finally, channelwise convolution is proposed and it plays an important role in balancing lightweight network and classification accuracy. Experimental results on the benchmark datasets (e.g., CUB-200-2011, FGVC-Aircraft, and Stanford Cars) showed that the LIR-CNN model had good performance in fine-grained image classification tasks.

Highlights

  • Fine-grained image classification has been one of the most popular research topics in the fields of computer vision and pattern recognition in recent years [1,2,3]

  • Compared with “LIR-CNN_V2”, the classification accuracy of “V3_DC+SC” increased by 0.4%, 0.5%, and 0.2%, respectively

  • We highlight the learnability of the local importance of the image

Read more

Summary

Introduction

Fine-grained image classification has been one of the most popular research topics in the fields of computer vision and pattern recognition in recent years [1,2,3]. Finding valid local area information is a vital step for fine-grained image classification algorithms. We use the convolution operation to learn how important each local area of an image (i.e., receptive field) is to the classification results. We assign these importance factors to the corresponding image regions to enhance the features of important regions and suppress noise. For fine-grained image classification, the excessive reduction of model parameters can result in the network not being able to learn more discriminative features. This paper proposes a channelwise convolution approach to achieve a good balance between lightweight model and classification accuracy.

Weak Supervised Classification Model
Lightweight Convolution Model
Local Importance Representation Convolution
Super-Pixel Segmentation Convolution
Channelwise Convolution
LIR-CNN Model
Experiments
Findings
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.