Multi-label Arabic text categorization: A benchmark and baseline comparison of multi-label learning algorithms

Bassam Al-Salemi,Masri Ayob,Graham Kendall,Shahrul Azman Mohd Noah

doi:10.1016/j.ipm.2018.09.008

Abstract

Multi-label text categorization refers to the problem of assigning each document to a subset of categories by means of multi-label learning algorithms. Unlike English and most other languages, the unavailability of Arabic benchmark datasets prevents evaluating multi-label learning algorithms for Arabic text categorization. As a result, only a few recent studies have dealt with multi-label Arabic text categorization on non-benchmark and inaccessible datasets. Therefore, this work aims to promote multi-label Arabic text categorization through (a) introducing “RTAnews”, a new benchmark dataset of multi-label Arabic news articles for text categorization and other supervised learning tasks. The benchmark is publicly available in several formats compatible with the existing multi-label learning tools, such as MEKA and Mulan. (b) Conducting an extensive comparison of most of the well-known multi-label learning algorithms for Arabic text categorization in order to have baseline results and show the effectiveness of these algorithms for Arabic text categorization on RTAnews. The evaluation involves four multi-label transformation-based algorithms: Binary Relevance, Classifier Chains, Calibrated Ranking by Pairwise Comparison and Label Powerset, with three base learners (Support Vector Machine, k-Nearest-Neighbors and Random Forest); and four adaptation-based algorithms (Multi-label kNN, Instance-Based Learning by Logistic Regression Multi-label, Binary Relevance kNN and RFBoost). The reported baseline results show that both RFBoost and Label Powerset with Support Vector Machine as base learner outperformed other compared algorithms. Results also demonstrated that adaptation-based algorithms are faster than transformation-based algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-label Arabic text categorization: A benchmark and baseline comparison of multi-label learning algorithms

Abstract

Talk to us

Similar Papers

More From: Information Processing & Management

Lead the way for us

Journal: Information Processing & Management	Publication Date: Oct 22, 2018
Citations: 45

Similar Papers

A Proposed Arabic Text Classification Model using Multi-Label System
Hussain A Rahmana ... Salwa S Baawi
Journal of Al-Qadisiyah for Computer Science and Mathematics | VOL. 15
Hussain A Rahmana, et. al.Hussain A Rahmana ... Salwa S Baawi
30 Sep 2023
Journal of Al-Qadisiyah for Computer Science and Mathematics | VOL. 15

Harnessing Multi-label Classification Approaches for Economic Phenomena Categorization
Nofriani ... Novianto Budi Kurniawan
ASEAN Journal on Science and Technology for Development | VOL. 38
Nofriani, et. al. Nofriani ... Novianto Budi Kurniawan
31 Aug 2021
ASEAN Journal on Science and Technology for Development | VOL. 38

Business text classification with imbalanced data and moderately large label spaces for digital transformation
Muhammad Arslan ... Christophe Cruz
Applied Network Science | VOL. 9
Muhammad Arslan, et. al.Muhammad Arslan ... Christophe Cruz
30 Apr 2024
Applied Network Science | VOL. 9

Arabic Text Categorization Using Logistic Regression
Mayy M Al-Tahrawi
International Journal of Intelligent Systems and Applications | VOL. 7
Mayy M Al-TahrawiMayy M Al-Tahrawi
08 May 2015
International Journal of Intelligent Systems and Applications | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-label Arabic text categorization: A benchmark and baseline comparison of multi-label learning algorithms

Abstract

Talk to us

Similar Papers

More From: Information Processing &amp; Management

More From: Information Processing & Management