Abstract

We present a multiattentive recurrent neural network architecture for automatic multilingual readability assessment. This architecture considers raw words as its main input, but internally captures text structure and informs its word attention process using other syntax- and morphology-related datapoints, known to be of great importance to readability. This is achieved by a multiattentive strategy that allows the neural network to focus on specific parts of a text for predicting its reading level. We conducted an exhaustive evaluation using data sets targeting multiple languages and prediction task types, to compare the proposed model with traditional, state-of-the-art, and other neural network strategies.

Highlights

  • Readability assessment has been used by diverse stakeholders–from educators to public institutions—for determining the complexity of texts (Benjamin, 2012)

  • Xmij always contains all possible morphological tags considered for the language, assigning a Not applicable (NA) value when the label cannot be applied to the token—for example, tense would have a value of NA for all nouns

  • We describe the strategies considered in our assessment, including traditional formulas, stateof-the-art tools based on extensive feature engineering, and neural network structures intended for an ablation study on major components of Vec2Read

Read more

Summary

Introduction

Readability assessment has been used by diverse stakeholders–from educators to public institutions—for determining the complexity of texts (Benjamin, 2012). To improve the quality of automatic readability assessment, researchers turned to more sophisticated techniques that go beyond examining shallow features These techniques, typically based on supervised machine learning, incorporate hundreds (even thousands) of features that describe a text from multiple perspectives: syntax, morphology, cohesion, discourse structure, and subject matter (Dell’Orletta et al, 2011; Francois and Fairon, 2012; Denning et al, 2016; Arfeet al., 2018). The dependency on these numerous features, has made readability assessment tools too complex to deploy and apply to languages beyond the one for which they were originally designed. Feature and language dependency, along with lack of homogeneity in terms of readability scales, often prevent researchers from comparing new strategies with state-of-the-art counterparts, preventing community consensus on which features are the most beneficial for capturing text complexity (De Clercq and Hoste, 2016)

Objectives
Methods
Findings
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.