Abstract

The key to fine-grained image classification is to find discriminative regions. Most existing methods only use simple baseline networks or low-recognition attention modules to discover object differences, which will limit the model to finding discriminative regions hidden in images. This article proposes an effective method to solve this problem. The first is a novel layered training method, which uses a new training method to enhance the feature extraction ability of the baseline model. The second step focuses on key regions of the image based on improved long short-term memory (LSTM) and multi-head attention. In the third step, based on the feature map obtained by the dual attention network, spatial mapping is performed by a multi-layer perceptron (MLP). Then the element-by-element mutual multiplication calculation of the channel is performed to obtain a feature map with finer granularity. Finally, the CUB-200-2011, FGVC Aircraft, Stanford Cars, and MedMNIST v2 datasets achieved good performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call