Abstract

Text categorization refers to the process of grouping text or documents into classes or categories according to their content, which is a significant task in natural language processing. The majority of the present work focused on English text, with a few experiments on Arabic text. The text classification process consists of many steps, from preprocessing documents (removing stop words and stem method), to feature extraction and classification phase. A new improved approach for Arabic text categorization was proposed using mutual information in a hybrid deep learning model for classification. To test the proposed model, two datasets of Arabic documents are employed. The experimental results demonstrate that employing the proposed mutual information exceeds other prior techniques in terms of performance. In Akhbarona corpus, the Multi-Layer Perceptron achieved a minimum accuracy of 96.09%, while the hybrid Convolution-Long Short-Term Memory had a performance level of 99.28%. In Khaleej corpus, the Gated Recurrent Unit had the maximum accuracy of 98.23%, while Multi-Layer Perceptron had the lowest accuracy of 97.23%

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call