The subjective frequency of word n-grams

Cyrus Shaoul,Harald Baayen,Chris Westbury

doi:10.2298/psi1304497s

Cyrus Shaoul, Harald Baayen + Show 1 more

Open Access

https://doi.org/10.2298/psi1304497s

Copy DOI

Journal: Psihologija	Publication Date: Jan 1, 2013
Citations: 33	License type: CC BY-SA 4.0

Affiliation: University of Tübingen, University of Alberta

Abstract

When asked to think about the subjective frequency of an n-gram (a group of n words), what properties of the n-gram influence the respondent? It has been recently shown that n-grams that occurred more frequently in a large corpus of English were read faster than n-grams that occurred less frequently (Arnon & Snider, 2010), an effect that is analogous to the frequency effects in word reading and lexical decision. The subjective frequency of words has also been extensively studied and linked to performance on linguistic tasks. We investigated the capacity of people to gauge the absolute and relative frequencies of n-grams. Subjective frequency ratings collected for 352 n-grams showed a strong correlation with corpus frequency, in particular for n-grams with the highest subjective frequency. These n-grams were then paired up and used in a relative frequency decision task (e.g. Is green hills more frequent than weekend trips?). Accuracy on this task was reliably above chance, and the trial-level accuracy was best predicted by a model that included the corpus frequencies of the whole n-grams. A computational model of word recognition (Baayen, Milin, Djurdjevic, Hendrix, & Marelli, 2011) was then used to attempt to simulate subjective frequency ratings, with limited success. Our results suggest that human n-gram frequency intuitions arise from the probabilistic information contained in n-grams.

Full Text