Abstract

Classifying the sub-categories of an object from the same super-category (e.g., bird species and cars) in fine-grained visual classification (FGVC) highly relies on discriminative feature representation and accurate region localization. Existing approaches mainly focus on distilling information from high-level features. In this article, by contrast, we show that by integrating low-level information (e.g., color, edge junctions, texture patterns), performance can be improved with enhanced feature representation and accurately located discriminative regions. Our solution, named Attention Pyramid Convolutional Neural Network (AP-CNN), consists of 1) a dual pathway hierarchy structure with a top-down feature pathway and a bottom-up attention pathway, hence learning both high-level semantic and low-level detailed feature representation, and 2) an ROI-guided refinement strategy with ROI-guided dropblock and ROI-guided zoom-in operation, which refines features with discriminative local regions enhanced and background noises eliminated. The proposed AP-CNN can be trained end-to-end, without the need of any additional bounding box/part annotation. Extensive experiments on three popularly tested FGVC datasets (CUB-200-2011, Stanford Cars, and FGVC-Aircraft) demonstrate that our approach achieves state-of-the-art performance. Models and code are available at https://github.com/PRIS-CV/AP-CNN_Pytorch-master.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call