Speech Synthesis for the Generation of Artificial Personality

Matthew P Aylett,Alessandro Vinciarelli,Mirjam Wester

doi:10.1109/taffc.2017.2763134

Matthew P Aylett, Alessandro Vinciarelli + Show 1 more

Open Access

https://doi.org/10.1109/taffc.2017.2763134

Copy DOI

Abstract

A synthetic voice personifies the system using it. In this work we examine the impact text content, voice quality and synthesis system have on the perceived personality of two synthetic voices. Subjects rated synthetic utterances based on the Big-Five personality traits and naturalness. The naturalness rating of synthesis output did not correlate significantly with any Big-Five characteristic except for a marginal correlation with openness. Although text content is dominant in personality judgments, results showed that voice quality change implemented using a unit selection synthesis system significantly affected the perception of the Big-Five, for example tense voice being associated with being disagreeable and lax voice with lower conscientiousness. In addition a comparison between a parametric implementation and unit selection implementation of the same voices showed that parametric voices were rated as significantly less neurotic than both the text alone and the unit selection system, while the unit selection was rated as more open than both the text alone and the parametric system. The results have implications for synthesis voice and system type selection for applications such as personal assistants and embodied conversational agents where developing an emotional relationship with the user, or developing a branding experience is important.

Full Text