Gesture controlled synthetic speech and song.

Sidney Fels,Bob Pritchard,Eric Vatikiotis‐Bateson

doi:10.1121/1.4783337

Abstract

We describe progress on creating digital ventriloquized actors (DIVAs). DIVAs use hand gestures to synthesize audiovisual speech and song by means of an intermediate conversion of hand gestures to articulator (e.g., tongue, jaw, lip, and vocal chords) parameters of a computational three‐dimensional vocal tract model. Our parallel‐formant speech synthesizer is modified to fit within the MAX/MSP visual programming language. We added spatial sound and various voice excitation parameters in an easy‐to‐use environment suitable for musicians. The musician’s gesture style is learned from examples. DIVAs will be used in three composed stage works of increasing complexity performed internationally, starting with one performer initially and culminating in three performers simultaneously using their natural voices as well as the hand‐based synthesizer. Training performances will be used to study the processes associated with skill acquisition, the coordination of multiple “voices” within and among performers, and the intelligibility and realism of this new form of audio/visual speech production. We are also building a robotic face and computer graphics face that will be gesture controlled and synchronized with the speech and song. [This project is funded by the Canada Council for the Arts and Natural Sciences and Engineering Research Council, Canada. More information is at: www.magic.ubc.ca/VisualVoice.htm]

Full Text