Abstract

The self-labeled methods can enlarge the labeled set by continuously adding pseudo-labeled data from the unlabeled set and predicted by base classifiers. Mislabeling is a great challenge for self-labeled methods due to their generalization performance depending on the correct prediction of pseudo-labeled data. Existing solutions of self-labeled methods or frameworks employ data editing techniques based on traditional instance selection methods or differential evolution to overcome mislabeling. Nevertheless, they still suffer from the following issues: (a) data editing techniques based on traditional instance selection methods heavily rely on specific assumptions and some are not easily extended to other self-labeled methods; (b) the differential evolution may distort the original data distribution. To overcome mislabeling and the above issues in existing solutions, a novel framework based on sample subspace optimization for self-labeled semi-supervised classification (SSO-SLSSC) is proposed. SSO-SLSSC is a wrapping framework and can be highly compatible with most existing self-labeled methods. First, SSO-SLSSC can employ almost any self-labeled method to perform the iterative self-labeled process. Second, during the iterative self-labeled process, a binary particle swarm optimization-based sample subspace optimization (BPSOSSO) is innovatively proposed to select a subset containing correctly predicted pseudo-labeled data from newly predicted data and filter out the subset containing mislabeled data from newly predicted samples. Experimental results have proven that SSO-SLSSC outperforms 2 representative self-labeled frameworks and 5 advanced data editing techniques in overcoming mislabeling of 4 popular self-labeled methods on extensive benchmark data sets from UCI and Kaggle with different ratios of initial labeled samples and noise.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call