Abstract

Transductive graph-based semisupervised learning methods usually build an undirected graph utilizing both labeled and unlabeled samples as vertices. Those methods propagate label information of labeled samples to neighbors through their edges in order to get the predicted labels of unlabeled samples. Most popular semi-supervised learning approaches are sensitive to initial label distribution which happened in imbalanced labeled datasets. The class boundary will be severely skewed by the majority classes in an imbalanced classification. In this paper, we proposed a simple and effective approach to alleviate the unfavorable influence of imbalance problem by iteratively selecting a few unlabeled samples and adding them into the minority classes to form a balanced labeled dataset for the learning methods afterwards. The experiments on UCI datasets and MNIST handwritten digits dataset showed that the proposed approach outperforms other existing state-of-art methods.

Highlights

  • In recent years, the booming information technology leads to databases included a massive amount of data in different fields

  • Iterative Nearest Neighborhood Oversampling (INNO) algorithm we propose in this paper tries to convert a few unlabeled samples to labeled samples for minority classes, constructing a balanced or approximately balanced labeled dataset for standard graph-based semisupervised learning (SSL) methods afterwards

  • The class boundary will be severely skewed by the majority class in an imbalance between-class classification, which is proved by the experiments on UCI datasets and MNIST digit recognition

Read more

Summary

Introduction

The booming information technology leads to databases included a massive amount of data in different fields. The need for mining useful potential is inevitable. The target classes of most of these data records, called unlabeled records, are unknown, and the records with specified target classes are called labeled records. In machine learning, semisupervised learning (SSL) methods [1] train a classifier by combining labeled and unlabeled samples together, which has attracted attentions due to their advantage of reducing the need for labeled samples and improving accuracy in comparison with most of supervised learning methods. Most existing methods have shown encouraging success in many applications, they assume that the distribution between classes in both labeled and unlabeled datasets is balanced, which may not satisfy the reality [2]

Objectives
Methods
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call