Abstract

Multi-label text classification has grown in popularity in recent years, with each document being assigned numerous categories simultaneously. The Arabic Language has a very complex morphology and a vibrant nature; nonetheless, there needs to be more research on this topic for the Arabic Language. As a result, this study aims to present a method for the multi-label classification of Arabic texts based on binary relevance and the label power set transformation method. Three classification classifiers: namely logistics regression(LR), Random forest (RF), and multinomial naïve Bays (MNB), were experimentally assessed in this thesis. Furthermore, chi-square feature selection was investigated to improve the performance of the proposed model. The experimental results are implemented in Python programming using the "RTANews" multi-label Arabic text classification dataset. The results suggest that binary relevance combined with logistics regression produces the best results. It performed well, with an averaged micro-Recall of 0.8646. At the same time, the best result was produced by label power-set with the same algorithm and metrics of 0.8418 for the suggested multi-label Arabic text classification model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call