Abstract

This paper investigates the problem of text readability. This is an important research topic in modern scientific research, and how to quantify the analysis of text difficulty through scientific methods has important practical significance. This paper makes the following research on the readability of text: First, we explore the main factors affecting the text readability. We build a four-layer index system from the perspective of text characteristics and text logic to measure the readability of text. In this indicator system, word length, sentence length, sentence length and sentence average from the perspective of text length, the total number of text syllables, the total number of single syllables, multiple syllables, the number of common words, in terms of the number of text words, which is one of the highlights of this article. The data source of this article is 319 fourth and sixth grade reading materials as well as the text in English journals, with deletion, word segmentation and other preprocessing. We built a classification statistical model of the six metrics to achieve a systematic classification statistics for the six metrics at the first two levels. Statistically in terms of the number of common words and clauses, we used the high-frequency word tables in coca (Corpus of Contemporary American English) and the common logical structure words to build a model to obtain the data. We obtained the number of emotional words to model the number of emotional words, to measure text emotional color. Building on the model, we normalized the resulting raw data so that the metrics were in the same order of magnitude.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.