Abstract
Fine-grained images classification aims to analyze visual objects from subordinate categories, e.g., models of cars or species of dogs, which is challenging due to the small inter-class variations and the large intra-class variations. Recent research has focused on extracting distinguishable local features via part-based model or attention networks to locate discriminative regions/parts, which highly relies on the part annotation, incurs high background noise and ignores the semantic information during the learning. In this paper, we present a fine-grained image classification model, RPN-SCA-BCNN (RSCAB), based on target acquisition and feature fusion. In RSCAB, we first adopt a new Soft-NMS enhanced Region Proposal Network (RPN) for weak supervised object detection to separate the distinguishable object and its background to remove the background noise and costly annotation requirement. Then, we provide a new attention mechanism Spatial-Channel Attention (SCA) to focus on the spatial discriminative parts of the image to reduce the feature redundancy. Based on SCA, we further construct a Bilinear Convolutional Neural Network (BCNN) to fuse the high and low dimensional features by leveraging two networks obtaining more abundant multi-dimensional attention features for classification. The experimental results show that the proposed RSCAB model achieves higher classification accuracy over the state-of-the-art algorithms.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.