A Superior Arabic Text Categorization Deep Model (SATCDM)

M Alhawarat,Ahmad O Aseeri

doi:10.1109/access.2020.2970504

M Alhawarat, Ahmad O Aseeri

Open Access

https://doi.org/10.1109/access.2020.2970504

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 36	License type: CC BY 4.0

Affiliation: Prince Sattam Bin Abdulaziz University

Abstract

Categorizing Arabic text documents is considered an important research topic in the field of Natural Language Processing (NLP) and Machine Learning (ML). The number of Arabic documents is tremendously increasing daily as new web pages, news articles, social media contents are added. Hence, classifying such documents in specific classes is of high importance to many people and applications. Convolutional Neural Network (CNN) is a class of deep learning that has been shown to be useful for many NLP tasks, including text translation and text categorization for the English language. Word embedding is a text representation currently used to represent text terms as real-valued vectors in vector space that represent both syntactic and semantic traits of text. Current research studies in classifying Arabic text documents use traditional text representation such as bag-of-words and TF-IDF weighting, but few use word embedding. Traditional ML algorithms have already been used in Arabic text categorization, and good results are achieved. In this study, we present a Multi-Kernel CNN model for classifying Arabic news documents enriched with n-gram word embedding, which we call A Superior Arabic Text Categorization Deep Model (SATCDM). The proposed solution achieves very high accuracy compared to current research in Arabic text categorization using 15 of freely available datasets. The model achieves an accuracy ranging from 97.58% to 99.90%, which is superior to similar studies on the Arabic document classification task.

Highlights

Classification of text documents is of high importance for many Natural Language Processing (NLP) technologies
This study presents a deep learning model that is based on Convolutional Neural Network (CNN) and n-gram word embedding language models with sub-word information
The Superior Arabic Text Categorization Deep Model (SATCDM) dramatically outperforms the other models with accuracy ranging from 97.58% to 99.90%

Summary

Introduction

Classification of text documents is of high importance for many NLP technologies. Document classification is the process of categorizing documents into classes based on their contents. Classifying Arabic documents has always been a challenge due to the nature of the language itself having rich dialects and enormous numbers of synonyms. It reflects the lack of Arabic resources compared to other languages such as English, inaccurate stemming algorithms, the highderivative nature of the Arabic language, and equivocalness inflicted by diacritic are reasons to make such a classification task so complex [1], [2]. Categorizing Arabic text documents is considered an important research topic in the field of Arabic Natural Language Processing (ANLP) and Machine Learning (ML). Classifying Arabic documents in specific classes is of high importance to many people and applications.

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Superior Arabic Text Categorization Deep Model (SATCDM)

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Research On Text Classification Based On Deep Neural Network
Deageon Kim
International Journal of Communication Networks and Information Security (IJCNIS) | VOL. 14
Deageon KimDeageon Kim
31 Dec 2022
International Journal of Communication Networks and Information Security (IJCNIS) | VOL. 14

Word Embeddings for Natural Language Processing

-

01 Jan 2015
01 Jan 2015

A NOVEL ARABIC CORPUS FOR TEXT CLASSIFICATION USING DEEP LEARNING AND WORD EMBEDDING
Roua A Abou Khachfeh ... Ziad Osman
BAU Journal - Science and Technology | VOL. 3
Roua A Abou Khachfeh, et. al.Roua A Abou Khachfeh ... Ziad Osman
30 Dec 2021
BAU Journal - Science and Technology | VOL. 3

An effective approach for Arabic document classification using machine learning
Abdullah Y Muaad ... R Bhairava
Global Transitions Proceedings | VOL. 3
Abdullah Y Muaad, et. al.Abdullah Y Muaad ... R Bhairava
02 Apr 2022
Global Transitions Proceedings | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Superior Arabic Text Categorization Deep Model (SATCDM)

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access