In image classification, debiasing aims to train a classifier to be less susceptible to dataset bias, the strong correlation between peripheral attributes of data samples and a target class. For example, even if the frog class in the dataset mainly consists of frog images with a swamp background (i.e., bias aligned samples), a debiased classifier should be able to correctly classify a frog at a beach (i.e., bias conflicting samples). Recent debiasing approaches commonly use two components for debiasing, a biased model fB and a debiased model fD. fB is trained to focus on bias aligned samples (i.e., overfitted to the bias) while fD is mainly trained with bias conflicting samples by concentrating on samples which fB fails to learn, leading fD to be less susceptible to the dataset bias. While the state of the art debiasing techniques have aimed to better train fD, we focus on training fB, an overlooked component until now. Our empirical analysis reveals that removing the bias conflicting samples from the training set for fB is important for improving the debiasing performance of fD. This is due to the fact that the bias conflicting samples work as noisy samples for amplifying the bias for fB since those samples do not include the bias attribute. To this end, we propose a simple yet effective data sample selection method which removes the bias conflicting samples to construct a bias amplified dataset for training fB. Our data sample selection method can be directly applied to existing reweighting based debiasing approaches, obtaining consistent performance boost and achieving the state of the art performance on both synthetic and real-world datasets.
Read full abstract