Automated Spelling Error Detection in Assamese Texts using Deep Learning Approaches

Rituraj Phukan,Pritom Jyoti Goutom,Mandira Neog,Nomi Baruah

doi:10.1016/j.procs.2024.04.159

Abstract

The growing prevalence of regional languages, including Assamese, in user-generated content on social media platforms, has brought the problem of spelling errors into focus. In the realm of natural language processing (NLP) tasks, especially in hate speech and toxic comment detection models, the accuracy of spelling becomes of utmost importance to ensure that misspelled terms do not slip through undetected. This highlights the need for effective approaches to address spelling issues in such language-based applications. This research work suggests a novel approach for identifying word-level spelling errors in Assamese text. The approach employs two deep learning techniques, Long-short-term memory (LSTM) and Bidirectional Long-short-term memory (BiLSTM), to accurately detect misspelled words by analyzing the context of each word inside a sentence. The models are trained using a dataset of 2677 Assamese sentences that include both correctly and incorrectly spelled words. The LSTM and Bi-LSTM models have shown significant improvement over previous spelling detection research in Assamese. LSTM achieved an accuracy of 92.72%, while Bi-LSTM achieved 94.59% accuracy. These results highlight the efficacy of deep learning methods in identifying spelling errors in the Assamese language. By ensuring precise spelling detection and reducing misunderstandings caused by misspelled words, the proposed approach can be implemented to improve the reliability of NLP tasks such as identifying toxic comments and hate speech.

Full Text