Abstract

Speech synthesis is the process of production of artificial speech. The system used for generation of speech from text is called as text-to-speech (TTS) system. In TTS system, text and voice models for a particular language or multiple languages are given as input to the system, which generates speech as output corresponding to the provided voice models. Speech synthesis systems can be extremely useful to people who are visually challenged, visually impaired and illiterate to get into the mainstream society. More recent applications include spoken dialogue systems and communicative robots. HMM (Hidden Markov Model) based Speech synthesis is the emerging technology for TTS. HMM based speech synthesis system consists of training phase and synthesis phase. In the training part, phone and excitation parameters are extracted from speech database and modeled by context dependent HMMs. In synthesis part, the system will extract the suitable phone and excitation parameters from the previously trained models and generates the speech. The primary goal of Text-to-Speech (TTS) synthesis is to convert an arbitrary input text into intelligible and natural sounding speech. A TTS system trained for a particular language can be used to synthesize arbitrary text in that language. The accuracy of speech synthesis system depends on the quality of the recorded speech data base and the speaker. TTS uses linguistic analysis for correct pronunciation, prosody (pitch, duration etc.,) and acoustic representations of speech to generate waveforms. TTS system includes two main components: the front-end and the back-end. The front-end is the part of the system closer to the text input which is responsible for text analysis where conversion of ambiguous symbols like dates to their equivalent word format and grapheme to phoneme conversion takes place. The back-end is the part of the system that is closer to the speech output which converts the output of the front-end (phonetic transcriptions and prosodic information) to the corresponding waveform.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.