Abstract

With the vast development of deep learning, many deep learning-based approaches have demonstrated their outstanding performance on the task of fine-grained visual categorization (FGVC). However, existing fine-grained datasets mainly focus on simple images (i.e., objects tend to occupy a significantly larger portion of the image and appear in a relatively clear background). This seriously restricts the application of FGVC in real-world scenarios. In this paper, we construct a fine-grained dataset named AIBD-Cars, which contains 28,471 car images with complex backgrounds belonging to 196 fine-grained classes. Furthermore, we propose a Location-Aware Channel-Spatial Attention Network (LCSANet), which considers both locating object regions and mining discriminative information to achieve better fine-grained visual categorization in complex scenes. We evaluate popular fine-grained visual categorization algorithms to build a benchmark. Extensive experiments show that our proposed method achieves a new state of the art on AIBD-Cars and FGVC Aircraft, and competitive results on CUB-200-2011 and Stanford Cars.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call