Abstract
There is increasing evidence that hand gestures and speech synchronize their activity on multiple dimensions and timescales. For example, gesture’s kinematic peaks (e.g., maximum speed) are coupled with prosodic markers in speech. Such coupling operates on very short timescales at the level of syllables (200 ms), and therefore requires high-resolution measurement of gesture kinematics and speech acoustics. High-resolution speech analysis is common for gesture studies, given that field’s classic ties with (psycho)linguistics. However, the field has lagged behind in the objective study of gesture kinematics (e.g., as compared to research on instrumental action). Often kinematic peaks in gesture are measured by eye, where a “moment of maximum effort” is determined by several raters. In the present article, we provide a tutorial on more efficient methods to quantify the temporal properties of gesture kinematics, in which we focus on common challenges and possible solutions that come with the complexities of studying multimodal language. We further introduce and compare, using an actual gesture dataset (392 gesture events), the performance of two video-based motion-tracking methods (deep learning vs. pixel change) against a high-performance wired motion-tracking system (Polhemus Liberty). We show that the videography methods perform well in the temporal estimation of kinematic peaks, and thus provide a cheap alternative to expensive motion-tracking systems. We hope that the present article incites gesture researchers to embark on the widespread objective study of gesture kinematics and their relation to speech.
Highlights
There is increasing evidence that hand gestures and speech synchronize their activity on multiple dimensions and timescales
It can be argued that the absence of motion tracking in the standard methodological toolkit of the multimodal language researcher has further led to imprecisions and conceptual confusions
In Part II of this article, we provide a quantitative validation of two inexpensive motion-tracking videography methods by comparing performance of these approaches with a high-performance standard: a wired motion-tracking system called the Polhemus Liberty
Summary
There is increasing evidence that hand gestures and speech synchronize their activity on multiple dimensions and timescales. The study of the temporal dynamics of gesture–speech coordination has relatively lagged behind in use of kinematic measurement methods, especially as compared to the degree to which state-of-the-art (psycho)linguistic methods are employed for the study of speech (e.g., Loehr, 2012; Shattuck-Hufnagel & Ren, 2018) This manifests itself in the relative scarcity (as compared to other research on instrumental action) of published studies that have applied motion tracking in gesture–speech research (Alexanderson, House, & Beskow, 2013; Alviar, Dale, & Galati, 2019; Chu & Hagoort, 2014; Danner, Barbosa, & Goldstein, 2018; Ishi, Ishiguro, & Hagita, 2014; Leonard & Cummins, 2010; Krivokapic, Tiede, & Tyrone, 2017; Krivokapić, Tiede, Tyrone, & Goldenberg, 2016; Parrell, Goldstein, Lee, & Byrd, 2014; Pouw & Dixon, 2019a; Quek et al, 2002; Rochet-Capellan et al, 2008; Rusiewicz et al, 2014; Treffner & Peter, 2002; Zelic, Kim, & Davis, 2015). Wagner highlights that non-quantitative definitions have led to conceptual confusions that have made the literature markedly difficult to digest:
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have