A benchmark dataset and approach for fine-grained visual categorization in complex scenes

Xiang Zhang,Keran Zhang,Wanqing Zhao,Hangzai Luo,Sheng Zhong,Lei Tang,Jinye Peng,Jianping Fan

doi:10.1016/j.dsp.2023.104033

Abstract

With the vast development of deep learning, many deep learning-based approaches have demonstrated their outstanding performance on the task of fine-grained visual categorization (FGVC). However, existing fine-grained datasets mainly focus on simple images (i.e., objects tend to occupy a significantly larger portion of the image and appear in a relatively clear background). This seriously restricts the application of FGVC in real-world scenarios. In this paper, we construct a fine-grained dataset named AIBD-Cars, which contains 28,471 car images with complex backgrounds belonging to 196 fine-grained classes. Furthermore, we propose a Location-Aware Channel-Spatial Attention Network (LCSANet), which considers both locating object regions and mining discriminative information to achieve better fine-grained visual categorization in complex scenes. We evaluate popular fine-grained visual categorization algorithms to build a benchmark. Extensive experiments show that our proposed method achieves a new state of the art on AIBD-Cars and FGVC Aircraft, and competitive results on CUB-200-2011 and Stanford Cars.

Full Text