Abstract
This paper describes an Hidden Markov Model-based Punjabi text-to-speech synthesis system (HTS), in which speech waveform is generated from Hidden Markov Models themselves, and applies it to Punjabi speech synthesis using the general speech synthesis architecture of HTK (HMM Tool Kit). This Hidden Markov Model based TTS can be used in mobile phones for stored phone directory or messages. Text messages and caller’s identity in English language are mapped to tokens in Punjabi language which are further concatenated to form speech with certain rules and procedures. To build the synthesizer we recorded the speech database and phonetically segmented it, thus first extracting context-independent monophones and then context-dependent triphones. For e.g. for word bharat monophones are a, bh, t etc. & triphones are bh-a+r. These speech utterances and their phone level transcriptions (monophones and triphones) are the inputs to the speech synthesis system. System outputs the sequence of phonemes after resolving various ambiguities regarding selection of phonemes using word network files e.g. for the word Tapas the output phoneme sequence is �,�,� instead of phoneme sequence
Highlights
Speech is the most important form of communication in everyday life
This paper is organized as follows: In section 2 we present Hidden Markov Model based speech synthesis, section 3 describes overall implementation of Hidden Markov Model based Text-toSpeech System on Hidden Markov Model Toolkit architecture from feature extraction to training of system, the fourth part contains results of speech synthesis and the fifth part concludes the paper with Discussion and Conclusion
HMM-based Punjabi speech synthesis system is presented in this paper
Summary
Speech is the most important form of communication in everyday life. the dependence of human computer interaction on written text and images makes the use of computers impossible for visually and physically impaired and illiterate masses [1]. Lip, tongue, jaw etc.) and articulatory processes directly It is the most difficult method to implement due to lack of knowledge of the complex human articulation organs. Rule-based formant synthesis can produce quality speech which sounds unnatural, since it is difficult to estimate the vocal tract model and source parameters [3]. One more approach for speech synthesis is Hidden Markov Model based synthesis i.e. HTS It was initially implemented for Japanese language but, today, can be implemented for various languages viz. Hindi, English, Tamil etc. It is used for implementing prosody and various voice characteristics on the basis of probabilities without having large databases In this approach speech utterances are used to extract spectral (Mel-Cepstral Coeff.), excitation parameters and model context dependent phone models which are, in turn, concatenated and used to synthesize speech waveform corresponding to the text input. This paper is organized as follows: In section 2 we present Hidden Markov Model based speech synthesis, section 3 describes overall implementation of Hidden Markov Model based Text-toSpeech System on Hidden Markov Model Toolkit architecture from feature extraction to training of system, the fourth part contains results of speech synthesis and the fifth part concludes the paper with Discussion and Conclusion
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have