Automated Think-Aloud Protocol for Identifying Students with Reading Comprehension Impairment Using Sentence Embedding

Yongseok Yoo

doi:10.3390/app14020858

Abstract

The think-aloud protocol is a valuable tool for investigating readers’ cognitive processes during reading. However, its reliance on experienced human evaluators poses challenges in terms of efficiency and scalability. To address this limitation, this study proposes a novel application of natural language processing to automate the think-aloud protocol. Specifically, we use a sentence embedding technique to encode the stimulus text and corresponding readers’ responses into high-dimensional vectors, and the similarity between these embeddings serves as a feature. The properties of the feature are investigated for word-frequency-based and contextualized embedding models. Differences in the sentence embedding-based feature between poor comprehenders and normal readers are investigated. Using these features, seven machine learning models were trained to classify readers into normal and abnormal groups. The highest F1 score of 0.74 was achieved with the contextualized embedding and random forest classifier. This highlights the effectiveness of the embedding technique in extracting useful features for automating the think-aloud protocol for assessing reading comprehension abilities. The potential benefits of this automated approach include increased efficiency and scalability, ultimately facilitating the early diagnosis of reading comprehension impairment and individualized interventions.

Full Text