Abstract

Abstract Text complexity is an important construct in the study of English reading instruction in the context of globalization and in the field of applied linguistics, and the main purpose of its study is to match learners with reading texts that are appropriate to their language level in order to improve learning outcomes. Focusing on text complexity, in addition to focusing on lexical and syntactic dimensions, discourse complexity features are added to explore the application in the construction of text complexity hierarchy models based on feature optimization to be examined. In this study, with the text processing software developed by Kyle’s team, we used BNCbaby as the reference corpus to extract fine-grained indicators such as vocabulary, syntax and discourse complexity, and used principal component analysis to optimize the indicators by dimensionality reduction and determine the principal component features used to build the model. Different classification algorithms are used to construct separate models and compare their performance. Models constructed based on common traditional readability formulas and other single-dimensional features have significant advantages over models constructed from feature sets. In addition, the feature set and modeling method have good grading ability for other domestic textbook datasets and good performance in grading prediction for several different datasets, with strong generalization and generalization ability. This study integrates multivariate linguistic features with neural networks to construct a text complexity grading model, which provides a new path for text complexity research. The research results not only have certain theoretical significance for text complexity research, but also have high application value in the field of applied linguistics, which can provide references for the selection of students’ reading materials, the writing and adaptation of teaching materials, and the planning and test design of reading courses.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.