Abstract

Many interact daily with voice–user interfaces (VUIs), but acoustic research on VUI-directed speech (VDS) is relatively new. Prior work indicates intensity and F0 correlate with VDS [Cohn et al., 2022, JPhon 90]. Multiple acoustic variables of VDS were analyzed to explore if VDS is a register distinct from human-directed speech (HDS) and whether perceived human-likeness of VUI voices affects VDS characteristics. 27 participants’ Zoom recordings of 13 pre-scripted prompts and pre-recorded responses from two Amazon AWS-Polly-generated VUI voices (rated for human-likeness independently and by participants) were acoustically analyzed for word-initial voiceless plosives voice onset time (VOT), pitch variation, and vowel quality and quantity. Results of linear mixed-effects models indicate evidence of VDS-specific acoustic characteristics, some of which are affected by participants’ perceived human-likeness of the voices. Differences in pre-exposure and VUI interactions occur for /p/ VOT in consonant clusters, vowel duration (except /ɪ/), and /ɑ/ F2. Statistical differences are found for /p/ VOT in consonant clusters and vowel quality (e.g., /ɑ/ F1 and F2, and /ɪ/ F2) based on perceived human-likeness by participants. This study contributes to the growing VDS work examining how humans speak with devices and what affects VDS, which may influence considerations of VUI voice development.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call