Abstract

Recently, multilabel classification is of increasing interest in machine learning and artificial intelligence. However, the distances of samples in most Relief methods easily result in heterogeneous or similar samples abnormal when the distances are very large. Besides, the classification margin as a neighborhood radius for some reduction algorithms may be meaningless when the margin is too large. To overcome these drawbacks, this paper presents a multilabel feature selection method using the improved Relief and minimum redundancy maximum relevance (MRMR) based on neighborhood rough sets. First, the number of heterogeneous and similar samples is introduced to improve the label weighting method which can eliminate the influence of the large distances of samples. By combining with the new label weighting, the distances between the sample and its nearest-neighbor heterogeneous sample and between the sample and its nearest-neighbor similar sample are presented to develop a new feature weighting method. Second, the number of heterogeneous and similar samples continues to be used to improve the classification margin, thereby constraining the neighborhood radius, based on which the neighborhood approximation accuracy is constructed to effectively measure the uncertainty of samples in the boundary region and the completeness of knowledge. Third, by integrating with the new neighborhood approximation accuracy, two types of mutual information between features and labels and among features are proposed, and then the mutual information-based MRMR model is investigated to evaluate the significance of features. Finally, a multilabel feature selection algorithm is designed for improving the classification performance of multilabel data. Experimental results on thirteen public datasets illustrate the effectiveness of our developed algorithm that can select the significant features and achieve great performance for multilabel datasets.

Highlights

  • Nowadays, multilabel classification has been widely used in many real-world applications [1]

  • The developed neighborhood approximation accuracy is combined with mutual information to investigate the new minimum redundancy maximum relevance (MRMR) model that can measure the relevance of features and labels and the redundancy among features

  • These datasets are divided into training and test data subsets, respectively, where the training datasets are used for feature selection, and the classification performance of various methods is evaluated by the test datasets on the different classifiers [41], which include MLKNN, Bayes, Lazy, Rules, and Trees whose parameters are all set to their default values, respectively

Read more

Summary

INTRODUCTION

Multilabel classification has been widely used in many real-world applications [1]. The MRMR-based feature selection algorithms cannot evaluate the completeness of knowledge and fully eliminate all redundant features, which will reduce the prediction accuracy of multilabel classification Motivated by these observations, the developed neighborhood approximation accuracy is combined with mutual information to investigate the new MRMR model that can measure the relevance of features and labels and the redundancy among features. The developed neighborhood approximation accuracy is combined with mutual information to investigate the new MRMR model that can measure the relevance of features and labels and the redundancy among features This proposed MRMR model based on NRS is helpful to select the significant features and achieves the optimal performance for multilabel datasets.

LARGE MARGIN
NEIGHBORHOOD ROUGH SETS
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.