Recognition of english and russian-language texts based on frequency characteristics

Y Kotov,O Sanina

doi:10.1088/1757-899x/1019/1/012031

Recognition of english and russian-language texts based on frequency characteristics

Y Kotov, O Sanina

Open Access

https://doi.org/10.1088/1757-899x/1019/1/012031

Copy DOI

Journal: IOP Conf. Series: Materials Science and Engineering	Publication Date: Jan 1, 2021
Citations: 1	License type: cc-by

Affiliation: Novosibirsk State Technical University

#Russian-language Texts #Russian Languages + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Distinction of texts in one language from texts in others is necessary to solve the problems of automated text analysis. The paper presents criteria and critical values for recognizing English-language and Russian-language texts. The obtained criteria are estimated by experiments. The paper describes the methods to estimate the size of character codes and to identify a space character in a text. The algorithm for recognizing texts in the English and Russian languages with arbitrary encoding is studied and its accuracy is estimated experimentally.

Full Text