Human cognitive paradigm and its application in semi-supervised learning

Chun Zhang,Dongsheng Li,Junan Yang,Aixia Yong

doi:10.1016/j.ijleo.2013.07.136

Abstract

In many practical data mining applications such as web page classification, unlabeled training examples are readily available but labeled ones are fairly expensive to obtain. Therefore, semi-supervised learning algorithms such as Tri-training have attracted much attention. However, mislabeling the unlabeled data during the learning process is an inevitable problem and harms the performance improvement of the hypothesis. To solve this problem, a novel human cognitive paradigm is constructed for semi-supervised learning in this paper. In detail, based on local distribution of feature space, the majority voting scheme is substituted by an estimation of the probability of sample to belong to a certain class as an efficient strategy for data editing. It considers the form of the underlying probability distribution in the neighborhood of a point to identify and remove the mislabeled data. Validation of the proposed method is performed with extensive experiments. Results demonstrate that compared with Tri-training method, our method can more effectively and stably exploit unlabeled data to enhance the learning performance.

Full Text