Abstract
Due to the special features of Persian, developing natural language processing tools for it involves an array of challenges. Lack of efficient Persian knowledge sources is another obstacle to research this language. The goal of this article was to overcome these problems by implementing spelling correction task. The main outputs of this study included a parallel corpus, an N-gram language model for Persian, and a semantic-based spelling correction system named Perspell, which made use of extracted language model. Compared to its rival software (including Vafa spellchecker), Perspell could detect and correct nonword and real word errors more successfully. The rate of real word error detection in Perspell was 95%. In fact, its outstanding ability to detect real word errors as well as its significant improvement in terms of F-measure were the two advantages of the proposed system.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have