Abstract

Text readability is the problem of determining whether a text is suitable for a certain group of readers, and thus building a model to assess the readability of text yields great significance across the disciplines of science, publishing, and education. While text readability has attracted attention since the late nineteenth century for English and other popular languages, it remains relatively underexplored in Vietnamese. Previous studies on this topic in Vietnamese have only focused on the examination of shallow word-level features using surface statistics such as frequency and ratio. Hence, features at higher levels like sentence structure and meaning are still untapped. In this study, we propose the most comprehensive analysis of Vietnamese text readability to date, targeting features at all linguistic levels, ranging from the lexical and phrasal elements to syntactic and semantic factors. This work pioneers the investigation on the effects of multi-level linguistic features on text readability in the Vietnamese language.

Highlights

  • Text readability is a measure of how easy or difficult a text is to be read [1], effectively guiding the process of comprehending that text

  • We examined the effects of linguistic features at all levels on the readability assessment of Vietnamese texts

  • We extracted a total of 271 features from Vietnamese textbooks for primary school students and Literature for middle and high school students in Vietnam to explore

Read more

Summary

INTRODUCTION

Text readability is a measure of how easy or difficult a text is to be read [1], effectively guiding the process of comprehending that text. Studies that follow the machine learning approach seek to exploit neural network algorithms with great computational power that enable the manipulation of a broader range of features and at a deeper level to create text classifiers based on the readability level. Works that demonstrate this approach are Si and Callan [4], CollinsThompson and Callan [5], Pitler and Nenkova [6], Vajjala and Meurers [7], Sinha and Basu [8], Vajjala and Lučić [9], and Al Khalil, et al [10], among others.

RELATED WORKS
25 Part-of-Speech elements
RESEARCH DESIGN AND METHODOLOGY
Features
Corpus
Features Examination
Findings
CONCLUSIONS
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.