Abstract

Fine-grained image classification is challenging due to the fact that categories can only be distinguished by subtle and local differences. Existing weakly supervised fine-grained image classification methods usually directly extract discriminative regions from the high-level feature maps. We observe that the operation of overlaying local receptive fields in the convolutional neural network makes the discriminative regions spread in the high-level feature maps, which can cause inaccurate region localization. In this paper, we propose an end-to-end Two-Level Attention Activation Model (TL-AAM), which can solve the problem of discriminative region spreading and obtain more effective fine-grained features. Specifically, the TL-AAM consists of: (1) an object attention activation module (OAAM), which links the correct classification score with the object region localization through gradient reflow to accurately localize the object region in a mutually reinforcing way, (2) a multi-scale pyramid attention localization module (MPALM), which locates local feature region by selecting the region with the largest response value in the feature channel, and this module can accurately obtain the detailed features in the local region, (3) a local cross-channel attention module (LCAM), which can filter irrelevant information in the high-level semantic feature maps by giving higher weights to the feature channels with high response values in the feature maps. Extensive experiments verify that TL-AAM yields the state-of-the-art performance under the same settings with the most competitive approaches, in CUB-200-2011, FGVC-Aircrafts, and Stanford-Cars datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call