Text-guided Fourier Augmentation for long-tailed recognition

Weiqiu Wang,Zining Chen,Fei Su,Zhicheng Zhao

doi:10.1016/j.patrec.2024.01.017

Abstract

Real-world data often exhibits a long-tailed distribution in practical scenarios. However, deep learning models usually face challenges when it comes to effectively identifying infrequent classes amidst the abundance of prevalent ones. The fundamental issue lies in the scarcity of available information for tail classes. A highly intuitive approach is to uncover a greater amount of valuable information specifically tailored to these tail classes. We find that textual information of class names and frequency domain information of images are ignored by previous works in long-tailed visual recognition. Therefore, we propose a Text-Guided Fourier Augmentation (TGFA) method with the aid of language models and the Fourier transform to excavate more useful information for tail classes. Extensive experiments demonstrate that our proposed method effectively enriches training data on-the-fly, allowing for an end-to-end one-stage supervised contrastive learning framework that surpasses other methods including two-stage or multi-experts methods, in terms of efficiency and performance.

Full Text