Abstract

Fine-grained classification is a challenging task due to the difficulty of finding discriminative features and the localization of feature regions. To handle these challenges, a novel visual classification framework on panoramic attention mechanism that combines multiple attention networks to locate and identify features with more semantic interest is proposed. Firstly, based on the classical convolutional neural network, the global information of the image feature is expressed by linear fusion. Secondly, the foreground attention branch is used to further extract the distinguishing details of the salient features. Then, more features are mined from the complementary object area through the background attention branch to learn more perfect fine-grained feature expression. Finally, three network branches are trained together to enhance the network's ability to express representative features of fine-grained images. Our model can be viewed as a multi-branch network, which benefits each other and optimizes the network together. Experiments were conducted on CUB-200-2011, Stanford Dogs and FGVC-Aircraft datasets, and the accuracy was used as the quantitative measurement. Experimental results show that the proposed method has the highest accuracy; the average accuracy is 89.8%. It is effective and superior to the current advanced methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.