Abstract

AbstractFine-grained classification requires identifying images that belong to multiple subcategories within the same category. There are only subtle differences between highly similar images. Most existing methods only use baseline networks or a single attention module to extract features from images to discriminate similar images, which will limit the model to finding fine-grained regions hidden in images. This article proposes an effective method to solve this problem. The first is a novel layered training method that enhances the feature extraction capability of baseline models. The second step is to find multiple attention regions based on the features extracted by the baseline model. Here, the improved Long Short-Term Memory (LSTM) and Multi-Head Attention can be used to focus on the key areas of the image, which is conducive to the discovery of fine-grained features. In the third step, the attention features extracted based on the dual network are spatially mapped using a multi-layer perceptron (MLP). Then, the interaction dot product of the corresponding channels is performed on the attention features and the mapping features to guide the classification. Finally, to achieve good performance, we test several standard benchmark datasets, CUB-200-2011, FGVC Aircraft, and Stanford Cars.KeywordsData augmentationHierarchical trainingDenoising autoencoder and Dual attention mechanismInteractive attention

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call