Abstract

In many real applications, the data are always collected from different types and they are subjected to obtain partial labeling information of objects. Such data are referred to as partially labeled mixed-type data. There is currently few work on feature selection approaches for these data. Motivated by this issue, this paper aims at selecting the informative feature subset from partially labeled mixed-type data. At first, to improve the classification performance, an improved label propagation algorithm based on K-nearest neighbor is proposed, which marks the decision labels of unlabeled objects by making use of the information between unlabeled objects and labeled objects. On this basis, a feature multi-criteria measure based on the dependency, information entropy and information granulation is proposed for selecting candidate features. Finally, the corresponding semi-supervised feature selection algorithm is developed to select the feature subset for the partially labeled mixed-type data. Experimental results on UCI data sets demonstrate the effectiveness of the proposed feature selection algorithm and the superiority in terms of the classification accuracy compared with other algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call