Self-Training using a K-Nearest Neighbor as a Base Classifier Reinforced by Support Vector Machines

Abdelatif Ennaji,Driss Mammass,M&Apos;Bark Iggane,Mostafa El Yassa

doi:10.5120/8899-2925

Abstract

In supervised learning, algorithms infer a general prediction model based on previously labeled data. However, in many real-world machine learning problems, the number of labeled data is small, while the unlabeled data are abundant. Obviously, the reliability of the learned model depends essentially on the size of the training set (labeled data). Indeed, if the amount of labeled data is not high enough, the generalization errors of learned model may be important. In such situation, semi supervised learning algorithm may improve the generalization performance of this model by integrating unlabeled data in the learning process. One of the most classical methods of the semi-supervised learning is the self-training. An advantage of this method is that several traditional supervised learning algorithms are used to build the model in the self-training process. In this paper, the k-Nearest Neighbors (k-NN) classifier was chosen in making decision during the self-training process. We also propose to reinforce self-training strategy by using a Support vector machines (SVM) classifier that can help the kNN to label the unlabeled data. Experimental results showed that Self-training based on k-NN and SVM can outperform the results with the Self-training based on k-NN classifier only. .

Full Text