Abstract

In typical classification problems the data used to train a model for each class is often correctly labeled, and so that fully supervised learning can be utilized. For example, many illustrative labeled data sets can be found at sources such as the UCI Repository for Machine Learning (http://archive.ics.uci.edu/ml/), or at the Keel Data Set Repository (http://www.keel.es). However, increasingly many real world classification problems involve data that contain both labeled and unlabeled samples. In the latter case, the data samples are assumed to be missing all class label information, and when used as training data these samples are considered to be of unknown origin (i.e., to the learning system, actual class membership is completely unknown). Typically, when presented with a classification problem containing both labeled and unlabeled training samples, a technique that is often used is to throw out the unlabeled data. In other words, the unlabeled data are not included with existing labeled data for learning, and which can result in a poorly trained classifier that does not reach its full performance potential. In most cases, the primary reason that unlabeled data are not often used for training is that, and depending on the classifier, the correct optimal model for semi-supervised classification (i.e., a classifier that learns class membership using both labeled and unlabeled samples) can be far too complicated to develop. In previous work, results were shown based on the fusion of binary classifiers to improve performance in multiclass classification problems. In this case, Bayesian methods were used to fuse binary classifier fusion outputs, while selecting the most relevant classifier pairs to improve the overall classifier decision space. Here, this work is extended by developing new algorithms for improving semi-supervised classification performance. Results are demonstrated with real data form the UCI and Keel Repositories.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call