Random forest feature selection for partial label learning

Xianran Sun,Jing Chai

doi:10.1016/j.neucom.2023.126870

Abstract

Partial Label Learning (PLL) aims to induce a multi-class classifier to deal with the problem that each training instance is associated with a set of candidate labels, among which only one is valid but unknown. Feature selection, which choses some features related to the task (e.g., classification) and omits the unrelated ones, is challenging for PLL due to the ambiguous labeling information. In this paper, a random forest based method, namely, RFUTE, is developed to deal with the feature selection challenge in PLL. Due to the fact that the ground-truth labels are inaccessible in the training process, RFUTE first disambiguates the candidate labels, and then operates feature selection by calculating and sorting the total change of information entropy over all trees in the forest for all features. More importantly, a supplement approach is proposed after disambiguation to avoid the class imbalance problem. Comprehensive experiments over both synthetic and real-world partial label data sets empirically justify the effectiveness of RFUTE in improving the generalization performance of well-established PLL methods.

Full Text