Abstract

Impairments to speech motor control from alcohol intoxication are variable across individuals, making speaker-dependent approaches ideal for speech-based intoxication detection [Schiel et al., 2010. Proc. INTERSPEECH 2010]. Here, we evaluated whether individual acoustic features have high generalizability across speaker-dependent models. We selected 97 speakers (54 male, 43 female) from the Alcohol Language Corpus [Schiel et al., 2012. LRE. 46, 503-521] who had sufficient sober and intoxicated (>0.08% blood-alcohol concentration) recordings for speaker-dependent modeling. For each speaker, we extracted 9 features from vowels (F0–F3, jitter, shimmer, harmonics-to-noise ratio, duration, and duration variability) and 7 from consonants (spectral skewness and kurtosis, center of gravity, duration and duration variability, harmonics-to-noise ratio), and fitted these to speaker-dependent random forest models with 5-fold cross-validation to evaluate feature importance from the associated mean decrease in Gini impurity (GI). Across all speakers, consonant-based features tended to have stronger generalizability than vowel-based features, with spectral skewness and kurtosis being the most generalizable (GI: 0.11 and 0.09), and vowel duration and F2 being the least generalizable (GI: 0.04 and 0.03). Further experiments to explore additional features and evaluate sex-specific generalizability are ongoing. [Research funded by Tenvos Incorporated for the development of commercial speaker state-detection algorithms.]

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call