Abstract

There is a massive growth of text documents on the web. This led to the increasing need for methods that can organize and classify electronic documents (instances) automati-cally. Multi-label classification task is widely used in real-world problems and it has been applied on di˙erent applications. It assigns multiple labels for each document simultaneously. Few and insuÿcient research studies have investigated the multi-label text classification problem in the Arabic language. Therefore, this survey paper aims to present an extensive review of the existing multi-label classification methods and techniques that can deal with multi-label problem. Besides, we focus on Arabic language by covering the relevant applications of multi-label classification on the Arabic text, and identify the main challenges faced by these studies. Furthermore, this survey presents an experimental comparisons of di˙erent multi-label classification methods applied for the Arabic context and points out some baseline results. We found that further investigations are also needed to improve the multi-label classification task in the Arabic language, especially the hierarchical classification task.

Highlights

  • There is a massive growth of text documents on the web

  • multilabel classification (MLC) task is widely used in real-world problems and it has been applied on different applications like classification of digital libraries, electronic emails, electronic books, patents, and newspaper articles

  • The results showed that, since Hierarchy Of Multilabel classifiERs (HOMER) relies on similarity-based distribution and employed Binary Relevance (BR) and Naive Bayes (NB) classifiers, this reduces the computational cost in both training and test phases and it improves the predictive performance as well

Read more

Summary

Flat Classification

As presented in [2], a simple and straightforward method used to handle MLC task is BR method It transforms a MLC problem into several single-label classification problems and predicts the instance relevance for each single-label independently by training a binary classifier one per label. It is the most simple and standard method under this classification technique It transforms the MLC problem into a multi-class classification problem, and considers each distinct label-set in the training data as a new class of a multi-class classification task [20]. It provides some advantages since it considers label correlation and overcomes LP limitation by increasing number of distinct label-sets and it provides more accurate label prediction and competitive performance It suffers from the increasing number of classifiers generated according to a k random number used to determine the size of each model. Algorithm (LaCovaC) based on C4.5 that learns the labels relations and exploits them to improve the predictive performance

Hierarchical Multi-Label Classification
M -L E
Example-based Metrics
Label-based Metrics
Lexicon Approach
PT Methods based on Binary Classification
PT Methods based on Multi-Class Classification
Algorithm Adaptation Technique
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call