Abstract
The self-labeled methods can enlarge the labeled set by continuously adding pseudo-labeled data from the unlabeled set and predicted by base classifiers. Mislabeling is a great challenge for self-labeled methods due to their generalization performance depending on the correct prediction of pseudo-labeled data. Existing solutions of self-labeled methods or frameworks employ data editing techniques based on traditional instance selection methods or differential evolution to overcome mislabeling. Nevertheless, they still suffer from the following issues: (a) data editing techniques based on traditional instance selection methods heavily rely on specific assumptions and some are not easily extended to other self-labeled methods; (b) the differential evolution may distort the original data distribution. To overcome mislabeling and the above issues in existing solutions, a novel framework based on sample subspace optimization for self-labeled semi-supervised classification (SSO-SLSSC) is proposed. SSO-SLSSC is a wrapping framework and can be highly compatible with most existing self-labeled methods. First, SSO-SLSSC can employ almost any self-labeled method to perform the iterative self-labeled process. Second, during the iterative self-labeled process, a binary particle swarm optimization-based sample subspace optimization (BPSOSSO) is innovatively proposed to select a subset containing correctly predicted pseudo-labeled data from newly predicted data and filter out the subset containing mislabeled data from newly predicted samples. Experimental results have proven that SSO-SLSSC outperforms 2 representative self-labeled frameworks and 5 advanced data editing techniques in overcoming mislabeling of 4 popular self-labeled methods on extensive benchmark data sets from UCI and Kaggle with different ratios of initial labeled samples and noise.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have