Abstract Text readability assessment aims to automatically evaluate the degree of reading difficulty of a given text for a specific group of readers. Most of the previous studies considered it as a classification task and explored a wide range of linguistic features to express the readability of a text from different aspects, such as semantic-based and syntactic-based features. Intuitively, when the external form of a text becomes more complex, individuals will experience more reading difficulties. Based on this motivation, our research attempts to separate the textual external form from the text and investigate its efficiency in determining readability. Specifically, in this paper, we introduce a new concept, namely textual form complexity, to provide a novel insight into text readability. The main idea is that the readability of a text can be measured by the degree to which it is challenging for readers to overcome the distractions of external textual form and obtain the text’s core semantics. To this end, we propose a set of textual form features to express the complexity of the outer form of a text and characterize its readability. Findings show that the proposed external textual form features can be used as effective evaluation indexes to indicate the readability of text. It brings a new perspective to the existing research and provides a new complement to the existing rich features.
Read full abstract