Abstract

With the rapid growth of the Internet, the curse of dimensionality caused by massive multi-label data has attracted extensive attention. Feature selection plays an indispensable role in dimensionality reduction processing. Many researchers have focused on this subject based on information theory. Here, to evaluate feature relevance, a novel feature relevance term (FR) that employs three incremental information terms to comprehensively consider three key aspects (candidate features, selected features, and label correlations) is designed. A thorough examination of the three key aspects of FR outlined above is more favorable to capturing the optimal features. Moreover, we employ label-related feature redundancy as the label-related feature redundancy term (LR) to reduce unnecessary redundancy. Therefore, a designed multi-label feature selection method that integrates FR with LR is proposed, namely, Feature Selection combining three types of Conditional Relevance (TCRFS). Numerous experiments indicate that TCRFS outperforms the other 6 state-of-the-art multi-label approaches on 13 multi-label benchmark data sets from 4 domains.

Highlights

  • In recent years, multi-label learning [1,2,3,4] has been increasingly popular in applications such as text categorization [5], image annotation [6], protein function prediction [7], etc

  • Feature selection is the process of selecting a set of feature subsets with distinguishing features from the original data set according to specific evaluation criteria

  • Hamming Loss is conducted on the Multi-Label k-Nearest Neighbor (ML-kNN) (k = 10) classifier, and Macro-F1 and Micro-F1 measures are conducted on the Support Vector Machine (SVM) and 3-Nearest Neighbor (3NN) classifiers

Read more

Summary

Introduction

Multi-label learning [1,2,3,4] has been increasingly popular in applications such as text categorization [5], image annotation [6], protein function prediction [7], etc. Feature selection is effective for chatter vibration diagnosis in CNC machines [9]. The number of features in text multi-label data is frequently in the tens of thousands, which means that there are a lot of redundant or irrelevant features [11,12]. Redundant or irrelevant features can be eliminated to improve model accuracy and reduce feature dimensions, feature space, and running time [14,15]. The selected features are more conducive to model understanding and data analysis

Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call