Abstract

Spelling error detection and correction is an essential technique to ensure high quality writing in any language. A spell checker helps to identify possible misspelled words and provides most probable suggestions to improve the text. Developing a Bangla spell checker is quite challenging as the lexical resources in Bangla are limited as well as the language rules are hard to generalize. Bangla spelling errors can be broadly categorized as non-word error and real-word or semantic error. Real-word errors are context sensitive and more complex to correct. In this article, we present a Bangla text correction model for both non-word errors and real-word errors. The proposed model uses a combination of edit distance and n-gram language model to detect an erroneous target word and provides improvement suggestions. We have collected around 1,00,000 articles from online sources and compiled a proprietary dictionary of n-grams from the tokens of those articles. For our experiment, we have compiled 11,500 Bangla sentences to apprise our model and the performance rate of our proposed model is 96% on an average for error detection to handle both types of spelling errors.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call