Abstract

This study examines participants’ vocal accommodation toward text-to-speech (TTS) voices produced by three devices, varying in the extent to which they embody a human form. Thirty eight speakers shadowed words produced by a male and female TTS voice presented across three physical forms: an Amazon Echo smart speaker (least human-like), Nao robot (slightly more human-like), and a Furhat robot (more human-like). Ninety-six independent raters completed a separate AXB perceptual similarity assessment, which provides a holistic evaluation of accommodation. Results show convergence to the voices across all physical forms; convergence is even stronger toward the female TTS voice when presented with the Echo smart speaker form in the female TTS voice, consistent with participants' higher rated likability and lower creepiness of the Echo. We interpret our findings through the lens of communication accommodation theory (CAT), providing support for accounts of speech communication and human–computer interaction frameworks.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.