Abstract

Sentence complexity assessment is a relatively new task in Natural Language Processing. One of its aims is to highlight in a text which sentences are more complex to support the simplification of contents for a target audience (e.g., children, cognitively impaired users, non-native speakers and low-literacy readers (Scarton and Specia, 2018)). This task is evaluated using datasets of pairs of aligned sentences including the complex and simple version of the same sentence. For Brazilian Portuguese, the task was addressed by (Leal et al., 2018), who set up the first dataset to evaluate the task in this language, reaching 87.8% of accuracy with linguistic features. The present work advances these results, using models inspired by (Gonzalez-Garduno and Sogaard, 2018), which hold the state-of-the-art for the English language, with multi-task learning and eye-tracking measures. First-Pass Duration, Total Regression Duration and Total Fixation Duration were used in two moments; first to select a subset of linguistic features and then as an auxiliary task in the multi-task and sequential learning models. The best model proposed here reaches the new state-of-the-art for Portuguese with 97.5% accuracy 1 , an increase of almost 10 points compared to the best previous results, in addition to proposing improvements in the public dataset after analysing the errors of our best model.

Highlights

  • Readability is the ease of reading a text, not in its typographical aspects such as font size, but by measures such as its syntactic structure complexity, vocabulary frequency, content, style, and organisation that can be fitted to prior knowledge, reading skill, interest and motivation of the reader (Dubay, 2007).Tracking the automation of readability back to its origin, the first readability formulas can be found a century ago in the United States, aiming to help teachers, librarians and scholars to select reading materialProceedings of the 28th International Conference on Computational Linguistics, pages 5821–5831 Barcelona, Spain (Online), December 8-13, 2020 for classes (Davison and Green, 1988) (Bohn, 1990)

  • Readability analysis has become a large area of multidisciplinary research, which has an ever growing body of literature, related tasks (e.g., text simplification task (Vajjala and Meurers, 2014a) and text summarization task (Vodolazova and Lloret, 2019)), and has gained new computational approaches in this century using Natural Language Processing (NLP) and Machine Learning methods (Collins-Thompson, 2014)

  • The same approach can be used to assess the complexity of texts at the sentence level, (Dell’Orletta et al, 2014) demonstrated that a greater number of features are needed for readability prediction at the sentence level

Read more

Summary

Introduction

Readability is the ease of reading a text, not in its typographical aspects such as font size, but by measures such as its syntactic structure complexity, vocabulary frequency, content, style, and organisation that can be fitted to prior knowledge, reading skill, interest and motivation of the reader (Dubay, 2007).Tracking the automation of readability back to its origin, the first readability formulas can be found a century ago in the United States, aiming to help teachers, librarians and scholars to select reading materialProceedings of the 28th International Conference on Computational Linguistics, pages 5821–5831 Barcelona, Spain (Online), December 8-13, 2020 for classes (Davison and Green, 1988) (Bohn, 1990). The same approach can be used to assess the complexity of texts at the sentence level, (Dell’Orletta et al, 2014) demonstrated that a greater number of features are needed for readability prediction at the sentence level. A study conducted by (Gonzalez-Garduno and Søgaard, 2018) has achieved state-of-the-art performance in readability prediction for English sentences, using multi-task learning and eye-tracking measures. An example of an application for the sentence level approach is the complexity checker tool, proposed by (Scarton et al, 2017) that analyses all sentences in a text, highlighting the complex ones to help with the simplification process

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.