Multilabel Feature Selection Using Mutual Information and ML-ReliefF for Multilabel Classification

Enhui Shi,Shiguang Zhang,Lin Sun,Jiucheng Xu

doi:10.1109/access.2020.3014916

Abstract

Recently, multilabel classification algorithms play an increasingly significant role in data mining and machine learning. However, some existing mutual information-based algorithms ignore the influence of the proportions of labels on the correlation degree between features and label sets. Besides, the correlation degree of label sets cannot be accurately measured in most traditional ReliefF algorithms, and the repeated calculation arises from the division of heterogeneous neighbors. To overcome these shortcomings, this paper proposes a multilabel feature selection method using mutual information and improved multilabel ReliefF (ML-ReliefF). First, the proportion of each label is calculated in label space and combined with the mutual information of features and labels to construct a novel correlation degree between features and label sets to preprocess multilabel datasets, which is used to reduce runtime of ML-ReliefF. Second, the mutual information of label sets is introduced into improving accuracy of the correlation degree among label sets. Furthermore, two types of correlation degree for label sets based on ML-ReliefF are developed to divide similar and heterogeneous samples more clearly. Third, a divided method of heterogeneous neighbors is presented to effectively avoid the repeated calculation in ML-ReliefF, and a novel method of feature weighting based on ML-ReliefF is constructed to evaluate the importance of features. Finally, a multilabel feature selection algorithm based on mutual information and ML-ReliefF for multilabel classification is designed to improve the performance of multilabel classification. Experiments under fourteen multilabel datasets show the effectiveness of our algorithm and improve the classification performance for multilabel datasets.

Highlights

At present, multilabel classification has drawn widespread attention, and the multilabel data with a set of labels contain a vast number of noisy, redundant or irrelevant features, and they decrease classification accuracy [1]
To measure the correlation degree among label sets in multilabel classification, mutual information and information entropy are generally used as uncertainty measures to take into account the label dependency for multilabel feature selection algorithms [38]
Taking xi as an example, for each lj ∈/LSi, after finding xm corresponding to LSm containing lj to compute the correlation degree of label sets between xi and xm, the obtained values are arranged in ascending order, and the first k samples are selected as the heterogeneous neighbors of xi under lj

Summary

INTRODUCTION

Multilabel classification has drawn widespread attention, and the multilabel data with a set of labels contain a vast number of noisy, redundant or irrelevant features, and they decrease classification accuracy [1]. Lin et al [30] proposed a multilabel feature selection algorithm based on max-dependency and min-redundancy combining mutual information This method ignored the correlation among labels. Kong et al [21] constructed the multilabel feature selection algorithm based on ReliefF and F-statistic This model only considered the correlation degree among paired labels. Cai et al [35] proposed a ReliefF-based multilabel feature selection algorithm that took into account the correlation degree among label sets. In this model, the sample similarity and distance between features of the sample and its similar and heterogeneous neighbors were obtained to compute the weighting values of all features, and samples were randomly selected continuously, and the weighting values calculated last time were iteratively updated until the iteration was completed. Where Wp is the weighting value of feature p; simi,j is the similarity of sample xi and Hj (or Mj); Hj denotes the similar neighbor of xi; d(p, xi, Hj) expresses the distance between xi and Hj on p; k is the number of similar or heterogeneous neighbors of xi; P(C) describes the prior probability of label C; Mj represents the heterogeneous neighbor of xi; d(p, xi, Mj) expresses the distance between xi and Mj on p; and m expresses the number of iterations

CORRELATION DEGREE OF FEATURES AND LABEL SETS Definition 1

ML-RELIEFF WITH CORRELATION

ML-RELIEFF WITH HETEROGENEOUS DIVISION Definition 11

MULTILABEL FEATURE SELECTION ALGORITHM

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Multilabel Feature Selection Using Mutual Information and ML-ReliefF for Multilabel Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Multi-Label Feature Selection with Conditional Mutual Information.
Xiujuan Wang ... Yuchen Zhou
Computational Intelligence and Neuroscience | VOL. 2022
Xiujuan Wang, et. al.Xiujuan Wang ... Yuchen Zhou
08 Oct 2022
Computational Intelligence and Neuroscience | VOL. 2022

Multi-label classification approach for quranic verses labeling
Abdullahi Adeleke ... Riswan Efendi
Indonesian Journal of Electrical Engineering and Computer Science | VOL. 24
Abdullahi Adeleke, et. al.Abdullahi Adeleke ... Riswan Efendi
01 Oct 2021
Indonesian Journal of Electrical Engineering and Computer Science | VOL. 24

Multilabel feature selection using ML-ReliefF and neighborhood mutual information for multilabel neighborhood decision systems
Lin Sun ... Jiucheng Xu
Information Sciences | VOL. 537
Lin Sun, et. al.Lin Sun ... Jiucheng Xu
07 Jun 2020
Information Sciences | VOL. 537

A label-specific multi-label feature selection algorithm based on the Pareto dominance concept
Shima Kashef ... Hossein Nezamabadi-Pour
Pattern Recognition | VOL. 88
Shima Kashef, et. al.Shima Kashef ... Hossein Nezamabadi-Pour
17 Dec 2018
Pattern Recognition | VOL. 88

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multilabel Feature Selection Using Mutual Information and ML-ReliefF for Multilabel Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access