Acoustic-phonetic properties of Siri- and human-directed speech

Bruno Ferenc Segedin,Michelle Cohn,Georgia Zellou

doi:10.1016/j.wocn.2021.101123

Bruno Ferenc Segedin, Michelle Cohn + Show 1 more

Open Access

https://doi.org/10.1016/j.wocn.2021.101123

Copy DOI

Journal: Journal of Phonetics	Publication Date: Dec 20, 2021
Citations: 24	License type: cc-by-nc-nd

Affiliation: University of California, Davis

Abstract

Millions of people engage in spoken interactions with voice activated artificially intelligent (voice-AI) systems in their everyday lives. This study explores whether speakers have a voice-AI-specific register, relative to their speech toward an adult human. Furthermore, this study tests if speakers have targeted error correction strategies for voice-AI and human interlocutors. In a pseudo-interactive task with pre-recorded Siri and human voices, participants produced target words in sentences. In each turn, following an initial production and feedback from the interlocutor, participants repeated the sentence in one of three response types: after correct word identification, a coda error, or a vowel error made by the interlocutor. Across two studies, the rate of comprehension errors made by both interlocutors was varied (lower vs. higher error rate). Register differences are found: participants speak louder, with a lower mean f0, and with a smaller f0 range in Siri-DS. Many differences in Siri-DS emerged as dynamic adjustments over the course of the interaction. Additionally, error rate shapes how register differences are realized. One targeted error correction was observed: speakers produce more vowel hyperarticulation in coda repairs in Siri-DS. Taken together, these findings contribute to our understanding of speech register and the dynamic nature of talker-interlocutor interactions.

Full Text