Abstract

Feature selection as an essential preprocessing step in multilabel classification has been widely researched. Due to the diversity and complexity of multilabel datasets, some feature selection methods are unstable and yield low predictive accuracy. To address these issues, this paper presents a novel multilabel feature selection method using multilabel ReliefF (ML-ReliefF) and neighborhood mutual information in multilabel neighborhood decision systems. First, to solve the problem of the few available randomly selected samples when searching the nearest samples in ReliefF, the coefficient of difference and the average distance among the nearest similar and heterogeneous samples are introduced to evaluate the differences among the samples, and then the average differences among the similar or heterogeneous samples are defined. Using the Jaccard correlation coefficient, a new formula for updating feature weights is presented. Second, the margin of the sample is studied to granulate all samples under each label, and the concept of the neighborhood is given. By combining algebra with information views, some neighborhood entropy-based uncertainty measures for multilabel classification are investigated, and new neighborhood mutual information is proposed. Furthermore, an optimization objective function is constructed to evaluate the candidate features in multilabel neighborhood decision systems, all the properties are discussed, and the relationships of these measures are established. Finally, an improved ML-ReliefF algorithm is designed for preliminarily eliminating unrelated features to decrease the computational complexity for multilabel classification, and a heuristic forward multilabel feature selection algorithm is developed to remove redundant features and improve classification performance. Experimental results conducted on thirteen multilabel datasets to verify the effectiveness of the proposed algorithms in multilabel neighborhood decision systems are compared with representative methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call