Abstract
Classifying fine-grained categories (e.g., bird species, car, and aircraft types) is a crucial problem in image understanding and is difficult due to intra-class and inter-class variance. Most of the existing fine-grained approaches individually utilize various parts and local information of objects to improve the classification accuracy but neglect the mechanism of the feature fusion between the object (global) and object’s parts (local) to reinforce fine-grained features. In this paper, we present a novel framework, namely object–part registration–fusion Net (OR-Net), which considers the mechanism of registration and fusion between an object (global) and its parts’ (local) features for fine-grained classification. Our model learns the fine-grained features from the object of global and local regions and fuses these features with the registration mechanism to reinforce each region’s characteristics in the feature maps. Precisely, OR-Net consists of: (1) a multi-stream feature extraction net, which generates features with global and various local regions of objects; (2) a registration–fusion feature module calculates the dimension and location relationships between global (object) regions and local (parts) regions to generate the registration information and fuses the local features into the global features with registration information to generate the fine-grained feature. Experiments execute symmetric GPU devices with symmetric mini-batch to verify that OR-Net surpasses the state-of-the-art approaches on CUB-200-2011 (Birds), Stanford-Cars, and Stanford-Aircraft datasets.
Highlights
Fine-grained classification is the branch of image classification that focuses on distinguishing objects in subordinate classes with subtle differences from the base classes
This study proposed a novel convolutional neural network, object–part registration
The whole-body stream and parts stream indicate the unique parts of the object, and their inputs are grabbed from the original image to provide more details when extracting features
Summary
Fine-grained classification is the branch of image classification that focuses on distinguishing objects in subordinate classes with subtle differences from the base classes. Some scholars collect the optical image with a surveillance camera to recognize the rainfall intensity [9,10], and parts use the satellite image to classify and predict [11,12] According to their CNN structures, we classify these studies into three categories: the multi-stream and attention-location/part-location approaches. Studies take the handmade part annotations to provide the parts information in the fine-grained image classification and utilize the multi-stream network to extract the feature of each part (local features) from various streams. The previous works design various convolutional neural networks associated with different factor variations, such as multi-stream framework and part information to generate the discriminative feature descriptors for the fine-grained image classification. (a) Original image (b) w/o registration–fusion features (c) w/ registration–fusion features
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.