Abstract

With the rapid advancement of big data technology, high-dimensional datasets comprising multi-label data have become prevalent in various fields. However, these datasets often contain more relevant and redundant features, which can adversely affect the performance of machine learning algorithms. Multi-label feature selection (MLFS) has emerged as a crucial pre-processing step in multi-label learning to address this issue. This survey provides an overview of multi-label learning and its algorithms, including problem transformation and algorithm adaptation. We also introduced three traditional strategies for MLFS: filter, wrapper, and embedded-based methods. Furthermore, we categorize existing research on multi-label feature selection into six aspects based on label fusion: label transformation-based (Binary Relevance-based and Label Powerset-based), label correlation-based (second and high-order, high and hybrid order), label specific-based, semi-supervised-learning-based, missing and noisy labels-based, and label enhancement-based approaches. We provide a detailed introduction to each method’s common approaches and theories. Additionally, we conduct experimental comparisons on practical multi-label learning datasets to evaluate the advantages and disadvantages of different algorithms. We discuss the application of multi-label feature selection in various domains, such as data mining, computer vision, natural language processing, and bio-informatics. Finally, we outline potential future research directions in multi-label feature selection, including MLFS with online learning, active learning, label distribution learning, partial label learning, granular computing, and class-imbalanced learning.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call