Abstract

When asked to think about the subjective frequency of an n-gram (a group of n words), what properties of the n-gram influence the respondent? It has been recently shown that n-grams that occurred more frequently in a large corpus of English were read faster than n-grams that occurred less frequently (Arnon & Snider, 2010), an effect that is analogous to the frequency effects in word reading and lexical decision. The subjective frequency of words has also been extensively studied and linked to performance on linguistic tasks. We investigated the capacity of people to gauge the absolute and relative frequencies of n-grams. Subjective frequency ratings collected for 352 n-grams showed a strong correlation with corpus frequency, in particular for n-grams with the highest subjective frequency. These n-grams were then paired up and used in a relative frequency decision task (e.g. Is green hills more frequent than weekend trips?). Accuracy on this task was reliably above chance, and the trial-level accuracy was best predicted by a model that included the corpus frequencies of the whole n-grams. A computational model of word recognition (Baayen, Milin, Djurdjevic, Hendrix, & Marelli, 2011) was then used to attempt to simulate subjective frequency ratings, with limited success. Our results suggest that human n-gram frequency intuitions arise from the probabilistic information contained in n-grams.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call