Weakly Supervised Fine-Grained Visual Recognition via Adversarial Complementary Attentions and Hierarchical Bilinear Pooling

Xiaofei Li,Mingwen Wang,Jianming Liu

doi:10.1007/978-3-030-36708-4_7

Abstract

Learning subtle and distinctive features is the key to fine-grained object recognition. Previous approaches use attention mechanisms to localize discriminatory regions and learn fine-grained details, which often treat each object part attention independently while neglecting the correlations among them and may lead them attentive to the same distinctive features. In this paper, we proposed a novel fine-grained visual recognition method to solve this problem by adversarial complementary attention (ACA) strategy and hierarchical bilinear pooling. Our method learns an attention-based classification branch through an attention module, and then train a counterpart classifier branch with adversarial complementary attention parts by erasing attentive regions in the first network branch from the feature maps. To capture different attention features interactions, a hierarchical bilinear pooling framework is used. Our method can be trained end-to-end without any bounding box/part annotations. Extensive experiments indicate that our approach is efficient and achieves the state-of-the-art performance on widely used fine-grained classification datasets, including CUB-200-2011, Stanford Cars, and FGVC-Aircraft datasets.

Full Text