Abstract

This paper shows how to use labeled and unlabeled data to improve inductive models with the help of transductivemodels.We proposed a solution for the self-training scenario. Self- training is an effective semi-supervised wrapper method which can generalize any type of supervised inductive model to the semi-supervised settings. it iteratively refines a inductive model by bootstrap from unlabeled data. Standard self-training uses the classifier model(trained on labeled examples) to label and select candidates from the unlabeled training set, which may be problematic since the initial classifier may not be able to provide highly confident predictions as labeled training data is always rare. As a result, it could always suffer from introducing too much wrongly labeled candidates to the labeled training set, which may severely degrades performance. To tackle this problem, we propose a novel self-training style algorithm which incorporate a graph-based transductive model in the self-labeling process. Unlike standard self-training, our algorithm utilizes labeled and unlabeled data as a whole to label and select unlabeled examples for training set augmentation. A robust transductive model based on graph markov random walk is proposed, which exploits manifold assumption to output reliable predictions on unlabeled data using noisy labeled examples. The proposed algorithm can greatly minimize the risk of performance degradation due to accumulated noise in the training set. Experiments show that the proposed algorithm can effectively utilize unlabeled data to improve classification performance.

Highlights

  • Traditional inductive models like Naive Bayes, CARTs[1], Support Vector Machines are always in supervised settings, which means these model can only be trained on labeled data

  • Standard self-training uses the classifier model(trained on labeled examples) to label and select candidates from the unlabeled training set, which may be problematic since the initial classifier may not be able to provide highly confident predictions as labeled training data is always rare

  • We show that incorporating transductive models to inductive models in semi-supervised settings can improve classification performance

Read more

Summary

INTRODUCTION

Traditional inductive models like Naive Bayes, CARTs[1], Support Vector Machines are always in supervised settings, which means these model can only be trained on labeled data. It could always suffer from introducing too much wrongly labeled candidates to the labeled training set, which may severely degrades performance Another drawback of self-training is that the newly added examples are not informative to the current classifier, since they can be classified confidently[7]. While our transductive model can naturally deals with noisy labeled data, which utilize ”label www.ijarai.thesai.org (IJARAI) International Journal of Advanced Research in Artificial Intelligence, Vol 3, No 2, 2014 smooth” to automatically adjust the potential wrong labels By incorporating this transductive model to the self-training process, we expect any applied supervised inductive model can be greatly improved. We propose a novel self-training algorithm which utilizes a graph-based transductive model for using both labeled and unlabeled data to label and select unlabeled example for training set augmentation. We will present the details of the proposed transductive graph-based model

Markov Random Walk with Constrains
Problem Description and Notation
EXPERIMENTS AND DISCUSSION
Findings
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call