ABSTRACT To give product and brand recommendations, marketers make use of conversational agents which increasingly communicate via voice rather than text. Existing research comparing the persuasiveness of text and voice agents showed mixed results. The quality of the speech synthesis employed may strongly influence consumers’ responses. This study investigates to what extent a voice agent with pragmatically aligned prosody is more persuasive (i.e. yields a more positive brand attitude) than an agent with a standard voice or text, and whether perceived human-likeness and perceived personalisation provide an underlying mechanism to explain these differences. In an experiment (n = 212), participants interacted with a conversational agent that recommended a camera. Results showed that a voice agent using prosody aligned to the information state of the user is more persuasive than a text agent. This effect is mediated by perceived human-likeness and perceived personalisation. Hence, aligned prosody can make synthetic speech meet a certain quality threshold to be perceived as more human-like. Theoretically, this study helps to unravel why conversational agents with human-like features are more persuasive.
Read full abstract