Abstract

Based on convolutional neural networks (CNNs), recent fine-grained visual classification (FGVC) are solved by locating the most discriminative parts, aggregating the most remarkable feature maps and learning high-order encoding. These works have proposed various interaction methods between global and local information. Compared with extracting global information, however, tracing discriminative local parts of objects breaks receptive field integrity and loses neighboring relationships. In some complex scenarios, recent FGVC models still face with the challenge of encoding whole objects integrally with less fine-grained local details. These factors motivate us to rethink rich feature utilization in FGVC task. In this paper, we propose Global Perception Attention Network (GPANet) for FGVC, a novel framework with a focal locator (FL) module and a global perception attention (GPA) module. The proposed network follows end-to-end location-classification designing paradigm for coarse-to-fine classifying. The FL works as a weakly-supervised location module by aggregating activation map and searching highest-response region. The GPA module is a core feature refinement module in our proposed method, consisting of a modified global perception module (mGPM) and a squeeze-and-excitation (SE) channel attention block, assembled through a residual structure. We perform a module-by-module ablation study for GPANet to prove its design effectivity and modular flexibility. Benchmark experiments are conducted on three public FGVC datasets. The proposed method achieves competitive performance compared with the state-of-the-arts and presents favorable generalization ability of several fine-grained recognition tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call