Semi-supervised feature selection for partially labeled mixed-type data based on multi-criteria measure approach

Wenhao Shu,Jianhui Yu,Zhenchao Yan,Wenbin Qian

doi:10.1016/j.ijar.2022.11.020

Abstract

In many real applications, the data are always collected from different types and they are subjected to obtain partial labeling information of objects. Such data are referred to as partially labeled mixed-type data. There is currently few work on feature selection approaches for these data. Motivated by this issue, this paper aims at selecting the informative feature subset from partially labeled mixed-type data. At first, to improve the classification performance, an improved label propagation algorithm based on K-nearest neighbor is proposed, which marks the decision labels of unlabeled objects by making use of the information between unlabeled objects and labeled objects. On this basis, a feature multi-criteria measure based on the dependency, information entropy and information granulation is proposed for selecting candidate features. Finally, the corresponding semi-supervised feature selection algorithm is developed to select the feature subset for the partially labeled mixed-type data. Experimental results on UCI data sets demonstrate the effectiveness of the proposed feature selection algorithm and the superiority in terms of the classification accuracy compared with other algorithms.

Full Text