Fine-Grained Image Classification Based on Target Acquisition and Feature Fusion

Yan Chu,Lina Wang,Wen Shan,Zhengkui Wang,Qingchao Zhao

doi:10.1007/978-3-030-82153-1_18

Abstract

Fine-grained images classification aims to analyze visual objects from subordinate categories, e.g., models of cars or species of dogs, which is challenging due to the small inter-class variations and the large intra-class variations. Recent research has focused on extracting distinguishable local features via part-based model or attention networks to locate discriminative regions/parts, which highly relies on the part annotation, incurs high background noise and ignores the semantic information during the learning. In this paper, we present a fine-grained image classification model, RPN-SCA-BCNN (RSCAB), based on target acquisition and feature fusion. In RSCAB, we first adopt a new Soft-NMS enhanced Region Proposal Network (RPN) for weak supervised object detection to separate the distinguishable object and its background to remove the background noise and costly annotation requirement. Then, we provide a new attention mechanism Spatial-Channel Attention (SCA) to focus on the spatial discriminative parts of the image to reduce the feature redundancy. Based on SCA, we further construct a Bilinear Convolutional Neural Network (BCNN) to fuse the high and low dimensional features by leveraging two networks obtaining more abundant multi-dimensional attention features for classification. The experimental results show that the proposed RSCAB model achieves higher classification accuracy over the state-of-the-art algorithms.

Full Text