Response times and their distributions serve as a powerful lens into cognitive processes. We present a novel statistical methodology called Stratified Distributional Analysis (SDA) to quantitatively assess how key determinants of response times (word frequency and length) shape their distributions. Taking advantage of the availability of millions of lexical decision response times in the English Lexicon Project and the British Lexicon Project, we made important advances into the theoretical issue of linking response times and word frequency by analysing RT distributions as a function of word frequency and word length. We tested these distributions against the lognormal, Wald, and Gamma distributions and three measures of word occurrence (word form frequencies obtained from subtitles and contextual diversity as operationalized as discourse contextual diversity and user contextual diversity). We found that the RT distributions were best described by a lnorm distribution across both megastudies when word occurrence was quantified by a contextual diversity measure. The link between the lnorm distribution and its generative process highlights the power of SDA in elucidating mechanisms that govern the generation of RTs through the fitting of probability distributions. Using a hierarchical Bayesian framework, SDA yielded posterior distributions for the distributional parameters at the single-participant level, enabling probabilistic predictions of response times as a function of word frequency and word length, which has the potential to serve as a diagnostic tool to uncover idiosyncratic features of word processing. Crucially, while we applied our parsimonious methodology to lexical decision response times, it is applicable to a variety of tasks such as word-naming and eye-tracking data.
Read full abstract