Abstract

Gathering labeled data is one of the most time-consuming and expensive tasks in supervised machine learning. In practical applications, there are usually quite limited labeled training samples but abundant unlabeled data that is easy to collect. Semi-supervised learning and active learning are two important techniques for learning a discriminative classification model when labeled data is scarce. However, unlabeled data with significant noises and outliers cannot be well exploited and usually worsen the performance of semi-supervised learning and the performance of active learning also needs a powerful initial classifier learned from the quite limited labeled training data. In order to solve the above issues, in this paper we proposed a novel model of semi-supervised dictionary active learning (SSDAL), which aims to integrate semi-supervised learning and active learning to effectively use all the training data. In particular, two criterions based on estimated class possibility are designed to select the unlabeled data with confident class estimation for semi-supervised learning and the informative unlabeled data for active learning, respectively. Extensive experiments are conducted to show the superior performance of our method in classification applications, e.g., handwritten digit recognition, face recognition and large-scale image classification.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call