Abstract

We investigate the localization of subtle yet discriminative parts for fine-grained image recognition. Based on the observation that such parts typically exist within a hierarchical structure (e.g., from a coarse-scale "head" to a fine-scale "eye" when recognizing bird species), we propose a novel progressive-attention convolutional neural network (PA-CNN) to progressively localize parts at multiple scales. The PA-CNN localizes parts in two steps, where a part proposal network (PPN) generates multiple local attention maps, and a part rectification network (PRN) learns part-specific features from each proposal and provides the PPN with refined part locations. This coupling of the PPN and PRN allows them to be optimized in a mutually reinforcing manner, leading to improved pinpointing of fine-grained parts. Moreover, the convolutional parameters for a PPN at a finer scale can be inherited from the PRN at a coarser scale, enabling a rich part hierarchy (e.g., eye and beak in a bird's head) to be learned in a stacked fashion. Case studies show that PA-CNN can precisely identify parts without using bounding box/part annotations. In addition, quantitative evaluations demonstrate that PA-CNN yields state-of-the-art performance in three challenging fine-grained recognition tasks. i.e., CUB-200-2011, FGVC-Aircraft, and Stanford Cars.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.