Cold-start active learning for image classification

Qiuye Jin,Mingzhi Yuan,Shiman Li,Haoran Wang,Manning Wang,Zhijian Song

doi:10.1016/j.ins.2022.10.066

Abstract

Active learning (AL) aims to select valuable samples for labeling from an unlabeled sample pool to build a training dataset with minimal annotation cost. Traditional methods always require partially and initially labeled samples to start active selection and then query annotations of samples incrementally through several iterations. However, this scheme is not effective in the deep learning scenario. On the one hand, initially labeled sample sets are not always available in the beginning. On the other hand, the performance of the traditional model is usually poor in the early iterations due to limited training feedback. For the first time, we propose a cold-start AL model based on representative (CALR) sampling, which selects valuable samples without the need for an initial labeled set or the iterative feedback of the target models. Experiments on three image classification datasets, CIFAR-10, CIFAR-100 and Caltech-256, showed that CALR achieved a new state-of-the-art performance of AL in cold-start settings. Especially in low annotation budget conditions, our method can achieve up to a 10% performance increase compared to traditional methods. Furthermore, CALR can be combined with warm-start methods to improve the start-up efficiency while further breaking the performance ceiling of AL, which makes CALR have a broader application scenario.

Full Text