Abstract

In supervised learning, algorithms infer a general prediction model based on previously labeled data. However, in many real-world machine learning problems, the number of labeled data is small, while the unlabeled data are abundant. Obviously, the reliability of the learned model depends essentially on the size of the training set (labeled data). Indeed, if the amount of labeled data is not high enough, the generalization errors of learned model may be important. In such situation, semi supervised learning algorithm may improve the generalization performance of this model by integrating unlabeled data in the learning process. One of the most classical methods of the semi-supervised learning is the self-training. An advantage of this method is that several traditional supervised learning algorithms are used to build the model in the self-training process. In this paper, the k-Nearest Neighbors (k-NN) classifier was chosen in making decision during the self-training process. We also propose to reinforce self-training strategy by using a Support vector machines (SVM) classifier that can help the kNN to label the unlabeled data. Experimental results showed that Self-training based on k-NN and SVM can outperform the results with the Self-training based on k-NN classifier only. .

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.