Abstract

Multi-label text classification has grown in popularity in recent years, with each document being assigned numerous categories simultaneously. The Arabic Language has a very complex morphology and a vibrant nature; nonetheless, there needs to be more research on this topic for the Arabic Language. As a result, this study aims to present a method for the multi-label classification of Arabic texts based on binary relevance and the label power set transformation method. Three classification classifiers: namely logistics regression(LR), Random forest (RF), and multinomial naïve Bays (MNB), were experimentally assessed in this thesis. Furthermore, chi-square feature selection was investigated to improve the performance of the proposed model. The experimental results are implemented in Python programming using the "RTANews" multi-label Arabic text classification dataset. The results suggest that binary relevance combined with logistics regression produces the best results. It performed well, with an averaged micro-Recall of 0.8646. At the same time, the best result was produced by label power-set with the same algorithm and metrics of 0.8418 for the suggested multi-label Arabic text classification model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.