Abstract

Words typically form the basis of psycholinguistic and computational linguistic studies about sentence processing. However, recent evidence shows the basic units during reading, i.e., the items in the mental lexicon, are not always words, but could also be sub-word and supra-word units. To recognize these units, human readers require a cognitive mechanism to learn and detect them. In this paper, we assume eye fixations during reading reveal the locations of the cognitive units, and that the cognitive units are analogous with the text units discovered by unsupervised segmentation models. We predict eye fixations by model-segmented units on both English and Dutch text. The results show the model-segmented units predict eye fixations better than word units. This finding suggests that the predictive performance of model-segmented units indicates their plausibility as cognitive units. The Less-is-Better (LiB) model, which finds the units that minimize both long-term and working memory load, offers advantages both in terms of prediction score and efficiency among alternative models. Our results also suggest that modeling the least-effort principle for the management of long-term and working memory can lead to inferring cognitive units. Overall, the study supports the theory that the mental lexicon stores not only words but also smaller and larger units, suggests that fixation locations during reading depend on these units, and shows that unsupervised segmentation models can discover these units.

Highlights

  • Language researchers may agree that an utterance comprises a sequence of “units,” but it is not easy to come to an agreement on what these units are

  • Short and frequent collocations tend to form individual units, and some of those collocations are not in a syntactic phrase (e.g., i was); long words tend to be divided into subword units

  • We infer the locations of eye fixations from their counts in each word and calculate the lengths of the eye fixation units by assuming eye fixations are located in the middle of the units

Read more

Summary

Introduction

Language researchers may agree that an utterance comprises a sequence of “units,” but it is not easy to come to an agreement on what these units are. From a linguistic perspective (Jackendoff, 2002); or unigrams, bigrams, trigrams, etc. From a statistical perspective (Manning and Schütze, 1999). We take a cognitive perspective and aim to identify the cognitive units that play the role of building blocks in human language processing. Words seem to be the most generally accepted units, perhaps because the spaces in written European languages steer us toward implicitly assuming that individual words are the most distinctive elements of sentences. Pollatsek and Rayner (1989) summarized ten key questions for the cognitive science of reading; nearly half of them are about words. Another case in point is that there are many models of visual word recognition, such as the Interactive Activation

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call