Voice quality modification for emotional speech synthesis

Christophe D'Alessandro,Boris Doval

doi:10.21437/eurospeech.2003-474

Abstract

Abstract Synthesis of expressive speech has demonstrated that con-vincing natural sounding results are impossible to obtain with-out dealing with voice quality parameters. Time-domain andspectral-domain models of the voice source signal are pre-sented. Then algorithms for analysis and synthesis of voicequality are discussed, including modiﬁcation of the periodicand aperiodic components. These algorithms may be useful forapplications such as pre-processing of speech corpora, modiﬁ-cation of voice quality parameters together with intonation insynthesis, voice transformation. 1. Introduction Until recently, voice quality and its functions in speech com-munication has been only marginally considered in the speechcommunication community. However, there is some evidencethat voice quality settings and voice quality m odulations areplaying a central role in human voice-based communication.The inﬂuenceof prosodyon the segmentalaspectsof speechin-volves some variation of voice quality and vocal effort. There-fore, it seems necessary to take into account these effects inspeechsynthesis,andto searchfor methods that are able to dealwith voice quality and thus with the interactions between into-nation and segmental aspects.Most of the studies on emotional speech synthesis focusedmore onintonationparameters(f0, duration)thanonvoicequal-ity parameters, excepted maybe a stress parameter related tothe voice spectral tilt [8]. Most of the works addressing theproblem of voice quality synthesis were in the framework offormant synthesis [6], just because voice quality parametersare explicit parameters of the synthesizer in this case. How-ever voice quality modiﬁcation is also an important problemin the framework of concatenation-based synthesis. This hasbeen recognized particularly in the situations were sound qual-ity is of paramount importance e.g. for computer music andmusical acoustics. Real time spectral modiﬁcation procedureshave been proposed [1] for singing voice morphing and vocalimpersonation. High ﬁdelity music synthesis featuring a “Vir-tual Castrato” has been produced for the soundtrack of the ﬁlm“Farinelli” [2, 5] by morphing the voices of a male and a fe-male singer. Processing was based on concatenation synthe-sis and voice quality modiﬁcation. Voice quality was manuallymodiﬁed by harmonic adaptive dynamic ﬁltering using a phasevocoder technique. For speech synthesis a more systematic ap-proach is needed,just becauseit is impossible to use a trial anderror method in automatic systems.The work described herein is based on time-varying spec-tral processing of speech units for voice quality modiﬁcation,like in [1, 5]. In section 2, a spectral theory for voice modelsdescription is reviewed. Section 3 presents the algorithms usedfor modiﬁcation of voice quality. The last section contains adiscussionand a conclusion.

Full Text