Abstract

Fine-grained image classification aims at subdividing large coarse-grained categories into finer-grained subcategories. Most existing fine-grained research methods use a single attention mechanism or multiple sub-networks to zoom in and find distinguishable local feature regions. These models seldom explore the intrinsic connections between cross-layer features with similar semantic features. This tends to show erratic performance in images with complex backgrounds. To this end, we propose a feature-semantic fusion module to enhance the diversity of global feature information. Second, we employ cross-layer spatial attention and channel attention modules, which can accurately locate local key regions of images. Finally, we propose a cross-gate attention module that can find rich discriminative features from key object regions of images to guide the final classification. Experiments show that the proposed model performs well on three datasets: CUB-200-2011, Stanford cars, and FGVC aircraft.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call