Enhancing text classification performance by preprocessing misspelled words in Indonesian language

Reza Setiabudi,Andre Rusli,Ni Made Satvika Iswari

doi:10.12928/telkomnika.v19i4.20369

Abstract

Supervised learning using shallow machine learning methods is still a popular method in processing text, despite the rapidly advancing sector of unsupervised methodologies using deep learning. Supervised text classification for application user feedback sentiments in Indonesian Language is one of the applications which is quite popular in both the research community and industry. However, due to the nature of shallow machine learning approaches, various text preprocessing techniques are required to clean the input data. This research aims to implement and evaluate the role of Levenshtein distance algorithm in detecting and preprocessing misspelled words in Indonesian language, before the text data is then used to train a user feedback sentiment classification model using multinomial Naïve Bayes. This research experimented with various evaluation scenarios, and found that preprocessing misspelled words in Indonesian language using the Levenshtein distance algorithm could be useful and showed a promising 8.2% increase on the accuracy of the model’s ability to classify user feedback text according to their sentiments.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: TELKOMNIKA (Telecommunication Computing Electronics and Control)	Publication Date: Aug 1, 2021
Citations: 2	License type: cc-by-sa

R Discovery Prime

R Discovery Prime

Enhancing text classification performance by preprocessing misspelled words in Indonesian language

Abstract

Talk to us

Similar Papers

More From: TELKOMNIKA (Telecommunication Computing Electronics and Control)

Lead the way for us

Similar Papers

Comparing multi-step ahead building cooling load prediction using shallow machine learning and deep learning models
Raghavendra Chalapathy ... Nguyen Lu Dang Khoa
Sustainable Energy, Grids and Networks | VOL. 28
Raghavendra Chalapathy, et. al.Raghavendra Chalapathy ... Nguyen Lu Dang Khoa
01 Dec 2021
Sustainable Energy, Grids and Networks | VOL. 28

Building thermal load prediction through shallow machine learning and deep learning
Zhe Wang ... Mary Ann Piette
Applied Energy | VOL. 263
Zhe Wang, et. al.Zhe Wang ... Mary Ann Piette
20 Feb 2020
Applied Energy | VOL. 263

Implementation of Raita Algorithm in Manado-Indonesia Translation Application with Text Suggestion Using Levenshtein Distance Algorithm
Novanka Agnes Sekartaji ... Riza Arifudin
Recursive Journal of Informatics | VOL. 2
Novanka Agnes Sekartaji, et. al.Novanka Agnes Sekartaji ... Riza Arifudin
30 Sep 2024
Recursive Journal of Informatics | VOL. 2

Internet Financial News Text Classification Algorithm Based on Blockchain Technology
Long Chen
-
Long ChenLong Chen
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enhancing text classification performance by preprocessing misspelled words in Indonesian language

Abstract

Talk to us

Similar Papers

More From: TELKOMNIKA (Telecommunication Computing Electronics and Control)