Abstract

Recently, feature selection for multilabel classification has attracted substantial attention in many fields; however, some of the available methods ignore the correlations among labels and yield low classification performance. In addition, most feature selection algorithms that are based on multilabel neighborhood rough sets (MNRS) can only deal with finite sets for multilabel data. To address the issues, this paper presents a novel hybrid filter-wrapper multilabel feature selection method that is based on binary particle swarm optimization (BPSO) and MNRS with the Lebesgue measure for multilabel neighborhood decision systems. First, to overcome the problem that the traditional correlation-based feature selection (CFS) algorithm ignores the dependencies among labels, two types of average correlation between single labels and label sets and among labels are presented. Via combination with information-entropy-based uncertainty measures, a new average correlation among labels is studied. A novel comprehensive evaluation function of CFS (NCFS) is constructed. Then, NCFS is introduced as a fitness function into the original BPSO and improved BPSO algorithms to optimize multilabel classification in the early and later stages, respectively, and the optimization process is terminated when the maximum number of iterations is reached. Next, the Lebesgue measure of the neighborhood class is developed for investigating the neighborhood approximation accuracy and the dependency degree based on MNRS. Various properties are deduced, and the relationships among these measures are used to evaluate the uncertainty and correlations among labels of multilabel data. Finally, a hybrid filter-wrapper feature selection algorithm using NCFS-BPSO is designed for preliminarily eliminating redundant features to decrease the complexity, and a heuristic forward multilabel feature selection algorithm is proposed for improving the performance of multilabel classification. Experimental results on fifteen multilabel datasets demonstrate that our proposed algorithms are effective in selecting significant features and realizing great classification performance in multilabel neighborhood decision systems.

Highlights

  • With the development of data processing technologies in machine learning and data mining, multilabel classificationThe associate editor coordinating the review of this manuscript and approving it for publication was Huiling Chen .as a supervised method has attracted increasing attention from researchers in many real-world applications [1]

  • To develop a multilabel feature selection algorithm in multilabel neighborhood decision systems, this study focuses on three main objectives: (1) To overcome the problem that the traditional correlation-based feature selection (CFS) and multilabel correlation-based feature selection (MLCFS) models only evaluate the relationships between internal features and labels of samples and ignore the dependencies among labels in multilabel information systems, we investigate the average correlations between single labels and label sets and among labels

  • (3) To address the issue that the available multilabel neighborhood rough sets (MNRS)-based feature selection algorithms cannot deal with infinite sets for multilabel classification, by combining MNRS with the Lebesgue measure, a new MNRS model is proposed, and the concepts of neighborhood class, lower and upper approximations, neighborhood approximation accuracy, neighborhood dependency degree with neighborhood approximation accuracy and attribute significance are presented for multilabel neighborhood decision systems

Read more

Summary

INTRODUCTION

With the development of data processing technologies in machine learning and data mining, multilabel classification. When handling continuous and numerical data, the process of discretizing these datasets may result in poor classification performance [20] To overcome this drawback, researchers introduced neighborhood rough sets (NRS) as a filter strategy for investigating feature selection for multilabel classification [21]. (3) To address the issue that the available MNRS-based feature selection algorithms cannot deal with infinite sets for multilabel classification, by combining MNRS with the Lebesgue measure, a new MNRS model is proposed, and the concepts of neighborhood class, lower and upper approximations, neighborhood approximation accuracy, neighborhood dependency degree with neighborhood approximation accuracy and attribute significance are presented for multilabel neighborhood decision systems.

BINARY PARTICLE SWARM OPTIMIZATION
MULTILABEL NEIGHBORHOOD ROUGH SETS
LEBESGUE MEASURE-BASED MNRS
FEATURE SELECTION IN MULTILABEL
EXPERIMENT PREPARATION
COMPARISON OF NCFS-BPSO WITH CFS-BPSO
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call