Abstract
Accounts of human language comprehension propose different mathematical relationships between the contextual probability of a word and how difficult it is to process, including linear, logarithmic, and super-logarithmic ones. However, the empirical evidence favoring any of these over the others is mixed, appearing to vary depending on the index of processing difficulty used and the approach taken to calculate contextual probability. To help disentangle these results, we focus on the mathematical relationship between corpus-derived contextual probability and the N400, a neural index of processing difficulty. Specifically, we use 37 contemporary transformer language models to calculate the contextual probability of stimuli from 6 experimental studies of the N400, and test whether N400 amplitude is best predicted by a linear, logarithmic, super-logarithmic, or sub-logarithmic transformation of the probabilities calculated using these language models, as well as combinations of these transformed metrics. We replicate the finding that on some datasets, a combination of linearly and logarithmically-transformed probability can predict N400 amplitude better than either metric alone. In addition, we find that overall, the best single predictor of N400 amplitude is sub-logarithmically-transformed probability, which for almost all language models and datasets explains all the variance in N400 amplitude otherwise explained by the linear and logarithmic transformations. This is a novel finding that is not predicted by any current theoretical accounts, and thus one that we argue is likely to play an important role in increasing our understanding of how the statistical regularities of language impact language comprehension.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.