Abstract

Semi-supervised classification is one of the core methods to deal with incomplete tag information without manual intervention, which has been widely used in various real problems for its excellent performance. However, the existing algorithms need to store all the unlabeled instances and repeatedly use them in the process of iteration. Thus, the large population size may result in slow execution speed and large memory requirements. Many efforts have been devoted to solving this problem, but mainly focused on supervised classification. Now, we propose an approach to decrease the size of the unlabeled instance set for semi-supervised classification algorithms. In this algorithm, we first divide the unlabeled instance set into several subsets with the information granulation mechanism, then sort the divided subsets according to the contribution to the classifier. Following this order, the subsets that take great classification performance are saved. The proposed algorithm is compared with the state-of-the-art algorithms on 12 real datasets, and experiment results show it could get a similar prediction ability but have the lowest instance storage ratio.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call