This paper introduces the DERCo (Dublin EEG-based Reading Experiment Corpus), a language resource combining electroencephalography (EEG) and next-word prediction data obtained from participants reading narrative texts. The dataset comprises behavioral data collected from 500 participants recruited through the Amazon Mechanical Turk online crowd-sourcing platform, along with EEG recordings from 22 healthy adult native English speakers. The online experiment was designed to examine the context-based word prediction by a large sample of participants, while the EEG-based experiment was developed to extend the validation of behavioral next-word predictability. Online participants were instructed to predict upcoming words and complete entire stories. Cloze probabilities were then calculated for each word so that this predictability measure could be used to support various analyses pertaining to semantic context effects in the EEG recordings. EEG-based analyses revealed significant differences between high and low predictable words, demonstrating one important type of potential analysis that necessitates close integration of these two datasets. This material is a valuable resource for researchers in neurolinguistics due to the word-level EEG recordings in context.
Read full abstract