Abstract

Abstract Synthesis of expressive speech has demonstrated that con-vincing natural sounding results are impossible to obtain with-out dealing with voice quality parameters. Time-domain andspectral-domain models of the voice source signal are pre-sented. Then algorithms for analysis and synthesis of voicequality are discussed, including modification of the periodicand aperiodic components. These algorithms may be useful forapplications such as pre-processing of speech corpora, modifi-cation of voice quality parameters together with intonation insynthesis, voice transformation. 1. Introduction Until recently, voice quality and its functions in speech com-munication has been only marginally considered in the speechcommunication community. However, there is some evidencethat voice quality settings and voice quality m odulations areplaying a central role in human voice-based communication.The influenceof prosodyon the segmentalaspectsof speechin-volves some variation of voice quality and vocal effort. There-fore, it seems necessary to take into account these effects inspeechsynthesis,andto searchfor methods that are able to dealwith voice quality and thus with the interactions between into-nation and segmental aspects.Most of the studies on emotional speech synthesis focusedmore onintonationparameters(f0, duration)thanonvoicequal-ity parameters, excepted maybe a stress parameter related tothe voice spectral tilt [8]. Most of the works addressing theproblem of voice quality synthesis were in the framework offormant synthesis [6], just because voice quality parametersare explicit parameters of the synthesizer in this case. How-ever voice quality modification is also an important problemin the framework of concatenation-based synthesis. This hasbeen recognized particularly in the situations were sound qual-ity is of paramount importance e.g. for computer music andmusical acoustics. Real time spectral modification procedureshave been proposed [1] for singing voice morphing and vocalimpersonation. High fidelity music synthesis featuring a “Vir-tual Castrato” has been produced for the soundtrack of the film“Farinelli” [2, 5] by morphing the voices of a male and a fe-male singer. Processing was based on concatenation synthe-sis and voice quality modification. Voice quality was manuallymodified by harmonic adaptive dynamic filtering using a phasevocoder technique. For speech synthesis a more systematic ap-proach is needed,just becauseit is impossible to use a trial anderror method in automatic systems.The work described herein is based on time-varying spec-tral processing of speech units for voice quality modification,like in [1, 5]. In section 2, a spectral theory for voice modelsdescription is reviewed. Section 3 presents the algorithms usedfor modification of voice quality. The last section contains adiscussionand a conclusion.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call