Understanding How to Administer Voice Surveys through Smart Speakers

Jing Wei,Difeng Yu,Chaofan Wang,Tilman Dingler,Jorge Goncalves,Vassilis Kostakos,Weiwei Jiang

doi:10.1145/3555606

Abstract

Smart speakers have become exceedingly popular and entered many people's homes due to their ability to engage users with natural conversations. Researchers have also looked into using smart speakers as an interface to collect self-reported health data through conversations. Responding to surveys prompted by smart speakers requires users to listen to questions and answer in voice without any visual stimuli. Compared to traditional web-based surveys, where users can see questions and answers visually, voice surveys may be more cognitively challenging. Therefore, to collect reliable survey data, it is important to understand what types of questions are suitable to be administered by smart speakers. We selected five common survey questionnaires and deployed them as voice surveys and web surveys in a within-subject study. Our 24 participants answered questions using voice and web questionnaires in one session. They then repeated the same study session after 1 week to provide a "retest'' response. Our results suggest that voice surveys have comparable reliability to web surveys. We find that, when using 5-point or 7-point scales, voice surveys take about twice as long as web surveys. Based on objective measurements, such as response agreement and test-retest reliability, and subjective evaluations of user experience, we recommend that researchers consider adopting the binary scale and 5-point numerical scales for voice surveys on smart speakers.

Full Text