Abstract

Recent studies have demonstrated that semi-supervised learning (SSL) approaches that use both labeled and unlabeled data are more effective and robust than those that use only labeled data. However, it is also well known that using unlabeled data is not always helpful in SSL algorithms. Thus, in order to select a small amount of helpful unlabeled samples, various selection criteria have been proposed in the literature. One criterion is based on the prediction by an ensemble classifier and the similarity between pairwise training samples. However, because the criterion is only concerned with the distance information among the samples, sometimes it does not work appropriately, particularly when the unlabeled samples are near the boundary. In order to address this concern, a method of training semi-supervised support vector machines (S3VMs) using selection criterion is investigated; this method is a modified version of that used in SemiBoost. In addition to the quantities of the original criterion, using the estimated conditional class probability, the confidence values of the unlabeled data are computed first. Then, some unlabeled samples that have higher confidences are selected and, together with the labeled data, used for retraining the ensemble classifier. The experimental results, obtained using artificial and real-life benchmark datasets, demonstrate that the proposed mechanism can compensate for the shortcomings of the traditional S3VMs and, compared with previous approaches, can achieve further improved results in terms of classification accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call