Abstract

Textual comprehension is often not adequately acquired despite intense didactic efforts. Textual comprehension quality is mostly evaluated using subjective criteria. Starting from the assumption that word usage statistics may be used to infer the probability of successful semantic representations, we hypothesized that textual comprehension depended on words with high occurrence probability (high degree of familiarity), which is typically inversely proportional to their information entropy. We tested this hypothesis by quantifying word occurrences in a bank of words from Portuguese language academic theses and using information theory tools to infer degrees of textual familiarity. We found that the lower and upper bounds of the database were delimited by low-entropy words with the highest probabilities of causing incomprehension (i.e., nouns and adjectives) or facilitating semantic decoding (i.e., prepositions and conjunctions). We developed an openly available software suite called CalcuLetra for implementing these algorithms and tested it on publicly available denotative text samples (e.g., articles, essays, and abstracts). We propose that the quantitative model presented here may apply to other languages and could be a tool for supporting automated textual comprehension evaluations, and potentially assisting the development of teaching materials or the diagnosis of learning disorders.

Highlights

  • Textual comprehension is a cornerstone of learning that contributes to a range of educational and developmental phenomena, including language acquisition, building vocabulary and literacy (Nowak et al, 2000)

  • Using an information theory-based approach, we developed a software tool (CalcuLetra) for quantifying average textual reading comprehension difficulty, measured under the premise that the words most frequently used in writing are the words most familiar to readers

  • Text comprehension was estimated based on the correlations between the semantic representations of the words and their respective probabilities to ensure that the textual complexity was related to the semantic entropy of the words and their familiarity degree

Read more

Summary

Introduction

Textual comprehension is a cornerstone of learning that contributes to a range of educational and developmental phenomena, including language acquisition, building vocabulary and literacy (Nowak et al, 2000). Textual comprehension is often defined as mental representations made by semantic connections in a text wherein codes are translated and combined (Kendeou et al, 2014). This process is determined not by concepts of reading difficulty but by rules pertaining to statistical linguistics. Features such as entropy, fatigue and comprehension, can be gauged from the mathematical relations of word usage. Zipf (1935) established specific laws on word frequency over a body of textual information He observed that only a few words are used often, and a higher number is used very rarely. These rules were mathematically formalized, and it was noted that the most common words in a text corpus occurred twice as often as the second most frequently used word and this pattern repeated itself for subsequent frequency rankings (the word in the position n appears 1/n times as often as the most frequent one)

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call