Abstract

Fine-grained visual classification (FGVC) distinguishes sub-categories under a large category, with significant differences within classes and minor differences between classes. Many current methods propose to apply the attention mechanism to mine the most salient parts of objects to obtain more refined feature representations. However, using the attention mechanism will bring two limitations: One is that the attention mechanism usually pays attention to the most salient parts of the object and ignores the insignificant but discriminative parts. The other is that the separate use of features of specific object part ignores the connections between different parts. To address the first limitation, we propose a feature re-attention module (FRAM) to obtain feature representations of multiple enhanced specific object parts. At the same time, the multi-layer feature fusion module (MLFFM) is used to fuse the feature representations of multiple enhanced object-specific parts to learn semantically complementary information from each other. The proposed method can be trained end-to-end and does not require additional human annotation information. Extensive experiments on three fine-grained public datasets demonstrate that our method achieves state-of-the-art performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.