Text representation and classification based on bi-gram alphabet

Fatma Elghannam

doi:10.1016/j.jksuci.2019.01.005

Fatma Elghannam

Open Access

https://doi.org/10.1016/j.jksuci.2019.01.005

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

In text classification, texts have to be transformed into numeric representations suitable for the learning algorithms. A main problem with the commonly used bag of words method is the high dimensions of vector space, as well as the need for language-dependent tools. In the present study, text classification is performed based on a novel bi-gram alphabet approach to construct feature terms. The proposed approach has two main contributions to text classification area. First, we have demonstrated the possibility of using constant feature terms that are based on the standard alphabet without the need for the documents vocabularies; this definitely helps in reducing the dimensions of the vector space for large corpus. Second, it does not require natural language processing tools. The current work has proved the ability to classify collections of Arabic or English text documents successfully. It showed approximately 80% savings in vector space and 2% performance improvement compared to the best recorded results on Arabic dataset Aljazeera News.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of King Saud University - Computer and Information Sciences	Publication Date: Jan 21, 2019
Citations: 18	License type: cc-by-nc-nd

R Discovery Prime

Text representation and classification based on bi-gram alphabet

Abstract

Published Version

Talk to us

Similar Papers

More From: Journal of King Saud University - Computer and Information Sciences

Lead the way for us

Similar Papers

An Overview of Bag of Words;Importance, Implementation, Applications, and Challenges
Wisam A Qader ... Musa M Ameen
-
Wisam A Qader, et. al.Wisam A Qader ... Musa M Ameen
01 Jun 2019
01 Jun 2019

Research On Text Classification Based On Deep Neural Network
Deageon Kim
International Journal of Communication Networks and Information Security (IJCNIS) | VOL. 14
Deageon KimDeageon Kim
31 Dec 2022
International Journal of Communication Networks and Information Security (IJCNIS) | VOL. 14

Text Classification Based on Neural Network Fusion
Deageon Kim
Tehnički glasnik | VOL. 17
Deageon KimDeageon Kim
19 Jul 2023
Tehnički glasnik | VOL. 17

Designing Explainable Text Classification Pipelines: Insights from IT Ticket Complexity Prediction Case Study
Aleksandra Revina ... Krisztian Buza
-
Aleksandra Revina, et. al.Aleksandra Revina ... Krisztian Buza
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Text representation and classification based on bi-gram alphabet

Abstract

Published Version

Talk to us

Similar Papers

More From: Journal of King Saud University - Computer and Information Sciences