Detection and Automatic Correction of Bengali Misspelled Words using N-Gram Model

Antara Pal,Alok Ranjan Pal,Sourav Mallick

doi:10.1109/icaect49130.2021.9392406

Abstract

This paper presents an approach for Detection and Automatic Correction of Bengali misspelled words using N-Gram model. Most of the works are established in this domain based on minimum edit distance between the words. But, in some of the cases, the distance based error correction method could not provide correct answer. This situation occurs when more than one suggestive word, de-rived by the distance based approach has the same distance value with respect to the misspelled word. The proposed model can handle this situation based on the local context of the misspelled word. This context analysis task is performed using N-Gram model. In the overall experiment, first of all, error correction is performed using minimum edit distance on 200 sentences containing some misspelled word in each of it. Around 40 instants occurred where more than one suggestive word had the same edit distance with respect to the misspelled word. The proposed strategy successfully finds the context wise most relevant word from the suggestions in most of the cases. Experimentally, tri-gram is used for handling the local context of the misspelled word.

Full Text