Abstract

In the real-world, data in various domains usually tend to be high-dimensional, which may result in considerable time complexity and poor performance for multi-label classification problems. Multi-label feature selection is an important preprocessing step in machine learning, which can effectively solve the so-called “curse of dimensionality” by removing irrelevant and redundant features. Nevertheless, the significance of related labels for each instance is generally different, which is an issue that most of the existing multi-label feature selection algorithms have not addressed. Hence, in this paper, we integrate label-distribution learning into multi-label feature selection from the perspective of granular computing with considering multiple feature correlations. Then, a novel multi-label feature selection algorithm based on label distribution and feature complementarity is developed. In addition, the proposed algorithm consists of two primary parts: first, the different significances of related labels for each instance in the multi-label data are obtained based on granular computing; second, the feature complementarity is estimated based on neighborhood mutual information without discretization. Moreover, the superiority of our proposed method over other state-of-the-art methods is demonstrated by conducting comprehensive experiments with 10 publicly available multi-label datasets on six widely-used metrics. Finally, the proposed method can significantly improve the performance of the classifier while reducing the dimension of the original data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call