Abstract

This paper presents the implementation details of a good quality, Kannada Text-To-Speech System (KTTS) that is phoneme-based, direct waveform concatenation easy to set up and use with little memory. Most existing TTS systems are unit-selection based, which use standard speech databases available in neutral adult voices. Prosody had also been incorporated. The incorporation of emotional features into speech can greatly improve the performance (naturalness) of speech synthesis system. Since emotional speech can be regarded as a variation on neutral (non-emotional) speech, it is expected that a robust neutral speech model can be useful in contrasting different emotions expressed in speech. Major elements such as duration, pitch and stress are presented as the main acoustic correlates of emotion in human speech. This inexpensive TTS system was implemented in MATLAB, with the synthesis presented by means of a graphical user interface. The quality of the synthesized speech was evaluated using the Mean opinion score (MOS). Keywords: Kannada Text-to-speech (KTTS); Direct wave concatenation; Prosody; Unit-selection based; Mean opinion score (MOS).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call