Abstract

Fine-grained visual classification (FGVC) is challenging due to the subtle inter-class variations. Key region location and discriminative feature extraction are the crucial aspects of FGVC. In this paper, we carry out research from the perspective of multi-granularity and multi-scale, explore the information in different granularities and scales, and propose a novel method called Multi-Scale Network via Progressive Multi-Granularity Attention (MSMGA-Net), which locates the key discriminative regions while avoiding recognition confusion caused by subtle inter-class variations. In order to achieve discriminative region locating, we adopt a progressive training strategy to make the network work in steps during training process, and the jigsaw puzzle generator (JPGen) is constructed to generate images with different granularity for different training steps. The focus of each step of training is to locate the key discriminative regions of specific granularity image at the corresponding stage of the network through the multi-granularity attention (MGA) module. Aiming to obtain the discriminative features, we also construct a multi-scale convolution (MSC) module for feature extraction corresponding to each step respectively, enabling the capture of different scales of details. We validated our MSMGA-Net through extensive experiments on CUB-200-2011, FGVC Aircraft and Stanford Cars. The experiments demonstrate that MSMGA-Net can get promising results. In particular, the accuracy on two publicly competitive benchmark datasets, CUB-200-2011 and FGVC Aircraft, reached 90.30% and 93.84%, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call