Abstract

Online unit selection from large speech databases provides an opportunity to essentially play back words, phrases, and even sentences which were included in a recorded corpus. This capability can be extremely useful for limited domains, e.g., application prompts. Without switching voices, such a synthesizer could integrate high-quality synthesis with near-perfect recorded material. However, traditional post-lexical processing (PLP) considers only the phoneme specifications and not the sequences which actually exist in the target database. Phonemes supplied by the dictionary are typically rewritten into a single sequence of phones with reduced vowels, flapped t’s, etc. Given the enormous variability of human speech, any single sequence is unlikely to match an entire phrase or prompt as spoken and labeled. This paper addresses the use of flexible PLP, allowing multiple transcription possibilities which are essentially equivalent, at least for the speaker in question. By building the equivalences from the specific dictionary used by the synthesizer and the detailed phonetic labeling of a specific voice database, longer regions of the database can be selected, reducing the number of concatenation points in ordinary synthesis and increasing the odds of selecting complete recorded phrases.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.