Recently, feature selection for multilabel classification has attracted substantial attention in many fields; however, some of the available methods ignore the correlations among labels and yield low classification performance. In addition, most feature selection algorithms that are based on multilabel neighborhood rough sets (MNRS) can only deal with finite sets for multilabel data. To address the issues, this paper presents a novel hybrid filter-wrapper multilabel feature selection method that is based on binary particle swarm optimization (BPSO) and MNRS with the Lebesgue measure for multilabel neighborhood decision systems. First, to overcome the problem that the traditional correlation-based feature selection (CFS) algorithm ignores the dependencies among labels, two types of average correlation between single labels and label sets and among labels are presented. Via combination with information-entropy-based uncertainty measures, a new average correlation among labels is studied. A novel comprehensive evaluation function of CFS (NCFS) is constructed. Then, NCFS is introduced as a fitness function into the original BPSO and improved BPSO algorithms to optimize multilabel classification in the early and later stages, respectively, and the optimization process is terminated when the maximum number of iterations is reached. Next, the Lebesgue measure of the neighborhood class is developed for investigating the neighborhood approximation accuracy and the dependency degree based on MNRS. Various properties are deduced, and the relationships among these measures are used to evaluate the uncertainty and correlations among labels of multilabel data. Finally, a hybrid filter-wrapper feature selection algorithm using NCFS-BPSO is designed for preliminarily eliminating redundant features to decrease the complexity, and a heuristic forward multilabel feature selection algorithm is proposed for improving the performance of multilabel classification. Experimental results on fifteen multilabel datasets demonstrate that our proposed algorithms are effective in selecting significant features and realizing great classification performance in multilabel neighborhood decision systems.
Read full abstract