SOUND STREAM: Towards vocal sound synthesis via dual-handed simultaneous control of articulatory parameters

Pramit Saha,Venkata Praneeth Srungarapu,Debasish R Mohapatra,Sid Fels

doi:10.1121/1.5068362

Abstract

This paper introduces Sound stream: a low-cost, tangible and ambidextrous controller which drives a dynamic muscle-based model of the human vocal tract for articulatory speech synthesis. The controller facilitates the multidimensional inputs which are mapped to the tongue muscles in a biomechanical modeling toolkit Artisynth using a microcontroller. As the vocal tract is a complex biological structure containing many muscles, it is a challenging and computationally expensive task to accommodate control for every muscle in the proposed scheme. So, we have followed a simplified approach by controlling the selective muscles for the efficient articulatory speech synthesis. The goal for designing an ambidextrous controller is to create new possibilities of controlling multiple parameters to vary the tongue position and shape simultaneously for generating various expressive vocal sounds. As a demonstration, the user learns to interact and control a mid-sagittal view of the tongue structure in Artisynth through a set of sensors using both hands. The Sound-Stream explores and evaluates the appropriate input and mapping methods to design a controllable speech synthesis engine. 1. Wang, J. et al. (2011) “Squeezy: Extending a multi-touch screen with force sensing objects for controlling articulatory synthesis,” in Proceedings on New Interfaces for Musical Expression, Oslo, Norway, pp. 531–532.This paper introduces Sound stream: a low-cost, tangible and ambidextrous controller which drives a dynamic muscle-based model of the human vocal tract for articulatory speech synthesis. The controller facilitates the multidimensional inputs which are mapped to the tongue muscles in a biomechanical modeling toolkit Artisynth using a microcontroller. As the vocal tract is a complex biological structure containing many muscles, it is a challenging and computationally expensive task to accommodate control for every muscle in the proposed scheme. So, we have followed a simplified approach by controlling the selective muscles for the efficient articulatory speech synthesis. The goal for designing an ambidextrous controller is to create new possibilities of controlling multiple parameters to vary the tongue position and shape simultaneously for generating various expressive vocal sounds. As a demonstration, the user learns to interact and control a mid-sagittal view of the tongue structure in Artisynth through ...

Full Text