ABSTRACT The broader aim of this study is the corpus-based investigation of the written language production process. To this end, temporal markers have been keylog recorded alongside the writing processes to exploit pauses to segment the speech product into linear units of performance. However, identifying these pauses requires selecting the relevant interkey intervals (IKI) between production microevents. Different models have been applied to identify such components in the empirical distribution of these delays, in particular Gaussian mixture models (GMM), yet no consensus has emerged regarding the number and interpretation of these components. Here, we analyze IKI distributions from a corpus of keylogs and show that a model with two components is robustly selected across a large range of participants. Furthermore, we show that the contents of these modes are consistent with a fluency vs. disfluency interpretation of the modes by identifying which kinds of events fall into each mode.
Read full abstract