Weakly Supervised Semantic and Attentive Data Mixing Augmentation for Fine-Grained Visual Categorization

Mengqi He,Qilong Cheng,Guanqiu Qi

doi:10.1109/access.2022.3163302

Abstract

As a key factor, the availability of large-scale training samples determines the improvement of visual performance. However, the size of Fine-Grained Visual Categorization (FGVC) datasets is always limited. Therefore, overfitting as an issue in FGVC-related training needs to be solved. Data mixing augmentation is a widely-used data augmentation method. In most of the recently proposed data mixing augmentation methods, random patch selection may generate meaningless training samples and result in model instability during the training process. This paper proposes a data mixing augmentation strategy termed Semantic and Attentive Data Mixing (SADMix) to select semantic patches for the generation of new training samples. In SADMix, a certain number of critical regions are localized according to convolutional activations. An image patch is selected from these localized regions for the generation of new training samples. The size, aspect ratio, and center location of these image patches are changed according to the random values from a beta distribution. These image patches with semantic information are used to mix two training images. According to the class activation map (CAM), training images and their labels are mixed proportionally to generate new mixed training samples and the corresponding labels. The proposed SADMix is tested on three fine-grained datasets, which are CUB-200-2011, FGVC Aircraft, and Stanford Cars, respectively. The experimental results confirm the effectiveness of the proposed SADMix.

Full Text