Abstract

The semi-supervised self-training method is one of the successful methodologies of semi-supervised classification and can train a classifier by exploiting both labeled data and unlabeled data. However, most of the self-training methods are limited by the distribution of initial labeled data, heavily rely on parameters and have the poor ability of prediction in the self-training process. To solve these problems, a novel self-training method based on density peaks and natural neighbors (STDPNaN) is proposed. In STDPNaN, an improved parameter-free density peaks clustering (DPCNaN) is firstly presented by introducing natural neighbors. The DPCNaN can reveal the real structure and distribution of data without any parameter, and then helps STDPNaN restore the real data space with the spherical or non-spherical distribution. Also, an ensemble classifier is employed to improve the predictive ability of STDPNaN in the self-training process. Intensive experiments show that (a) STDPNaN outperforms state-of-the-art methods in improving classification accuracy of k nearest neighbor, support vector machine and classification and regression tree; (b) STDPNaN also outperforms comparison methods without any restriction on the number of labeled data; (c) the running time of STDPNaN is acceptable.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call