Abstract
The spelling error is a mistake occurred while typing the text document. The applications like search engines, information retrieval, emails, etc., require user typing. In such applications, good spell-checker is essential to rectify the misspelling. Spell-checkers for western languages like English are very powerful and can handle any type of spelling errors, whereas in the case of Indian languages like Hindi, Urdu, Bengali, Kannada, Assamese, etc., the available spell-checkers are very basic ones. These spell-checkers are developed using traditional methods like statistical methods and rule-based methods. This article presents a novel model HINDIA to handle the spelling errors of the Hindi language, one of the most spoken languages in India. It utilizes a deep-learning method for spelling error detection and correction. The proposed spell-checking model works in two phases. In the first phase model identifies the erroneous words in the input sample and in the second phase it replaces the wrong words with the most probable correct words. Model HINDIA is developed using the attention-based encoder–decoder bidirectional recurrent neural network (BiRNN) which uses long short-term memory cells. Several modifications in the BiRNN have been made and network is fine-tuned to process the spelling errors of Hindi language. It uses publicly available dataset ‘monolingual corpus’ developed by IIT Mumbai for training and testing. The performance of the proposed model is evaluated in two scenarios. In the first scenario where the testing dataset is generated using split function. HINDIA performs significantly well with precision 0.86, recall 0.72, f-measure 0.78 and accuracy 0.80. Further, in the second scenario, where a dataset is manually generated its performance is fairly good with precision 0.81, recall 0.72, f-measure 0.76 and accuracy 0.74. Model HINDIA gives better performance than the deep-learning-based Malayalam spell-checker and some other deep-learning-based correction models present in the literature.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.