Abstract

Information support and algorithms of a software application for searching and correcting spelling and grammatical errors in texts written in Russian are considered. The application’s operation is supported by a database the tables of which are filled using the morphological dictionary of the Russian language containing more than four million word forms and texts of different genres. Before being subjected to correction, the text is divided into fragments; punctuation marks are used as text separators. The obtained text fragments are checked and corrected independently of each other. The text is corrected in two stages. At the first stage, spelling errors and errors caused by incorrect word formation are corrected using the spelling correction method based on the symmetric deletion algorithm. For each word with an error, a list of candidates for replacement is formed at the first text correction stage. The candidate with the lowest replacement cost — an indicator characterizing the proximity of the word to be replaced and the candidate — is chosen as the replacing word. If there are several candidates with the equal replacement cost, preference is given to the candidate with the highest number of entries in the texts that were previously used to fill the application database. At the second text correction stage, certain types of grammatical errors are corrected. The correction is carried out on the basis of prece- dents — cases of using the “word — next word” pair in the texts that have undergone editorial correction. By using the precedents found in the database, the application highlights the words to be replaced. By analogy with the first text correction stage, the replacing word is chosen from the list of candidates, but the replacement will not be done if its cost exceeds the permissible value. The text can be corrected both automatically and by interactively selecting a replacing word. In processing a test data set containing both spelling and grammatical errors, the application corrects more words than the Microsoft Word and Yandex-speller do.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call