Abstract

In many practical data mining applications such as web page classification, unlabeled training examples are readily available but labeled ones are fairly expensive to obtain. Therefore, semi-supervised learning algorithms such as Tri-training have attracted much attention. However, mislabeling the unlabeled data during the learning process is an inevitable problem and harms the performance improvement of the hypothesis. To solve this problem, a novel human cognitive paradigm is constructed for semi-supervised learning in this paper. In detail, based on local distribution of feature space, the majority voting scheme is substituted by an estimation of the probability of sample to belong to a certain class as an efficient strategy for data editing. It considers the form of the underlying probability distribution in the neighborhood of a point to identify and remove the mislabeled data. Validation of the proposed method is performed with extensive experiments. Results demonstrate that compared with Tri-training method, our method can more effectively and stably exploit unlabeled data to enhance the learning performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call