Abstract

The self-training method has been favored by scholars in semi-supervised classification. One of the greatest challenges in self-training methods is finding high-confidence unlabeled samples at each iteration. While multiple variations of self-training methods are developed, the strategy of finding high-confidence unlabeled samples heavlies on parameters and only utilizes labeled data alone, which makes the self-training method limited by the distribution of labeled data. To solve the above issues, a novel natural neighborhood graph-based self-training method (NaNG-ST) is proposed. In NaNG-ST, a parameter-free natural neighborhood graph (NaNG) is first constructed. The NaNG roughly reveals the real data distribution by exploiting unlabeled and labeled data. Based on NaNG, homogeneous and heterogeneous edges are defined to divide unlabeled samples into three cases. After that, homogeneous and heterogeneous edges can use the revealed distribution rather than labeled data alone to help NaNG-ST fast and effectively find confident unlabeled samples without any parameters. Besides, they also help NaNG-ST not to be limited by the distribution of initial labeled data. When a few initial labeled data can not roughly represent the distribution of the entire data, the NaNG helps NaNG-ST restore the real data distribution better. Intensive experiments on real-world data sets prove that NaNG-ST outperforms 7 popular semi-supervised self-taught approaches in terms of classification accuracy, mean F-measure and required running time.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call