Abstract

Self-training method is one of the relatively successful methodologies of semi-supervised classification. It can exploit both labeled data and unlabeled data to train a satisfactory supervised classifier. Mislabeling is one of the largest challenges in the self-training method and the most common technique for removing mislabeled samples is the local noise filter. However, existing local noise filters used in self-training methods confront following technical defects: parameter dependence and using only labeled data to remove mislabeled samples. To address these shortcomings, this paper proposes a novel self-training method based on density peaks and an extended parameter-free local noise filter (STDPNF). In STDPNF, the self-training method based on density peaks is redesigned to be more suitable for combination with local noise filters. Moreover, a new local noise filter based on natural neighbors is proposed to filter out mislabeled instances. Compared with existing local noise filters used in self-training methods, the one in STDPNF is parameter-free and can remove mislabeled samples by exploiting the information of both labeled data and unlabeled data. We focus on k nearest neighbor as a base classifier. In experiments, we verify the efficiency of STDPNF in improving the performance of the base classifier of k nearest neighbor and the advantage of STDPNF in having the ability to remove mislabeled instances efficiently even when labeled data are not sufficient.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call