Abstract

Developing a text to speech (TTS) system, commonly referred as TTS, that sounds similar to human natural speech is being attempted over the years, but still not achieved by even the best of presently available TTS algorithms. Most of these still sound robotic, unless human speech itself is present in them. However, such human speech necessitates creation of a large database of each and every word of that language which is quite an onerous task. This research article illustrates a new approach and methodology that helps to reduce database size by using “syllabic based concatenative speech synthesis”. In this method, new words are ‘created’ using existing words and syllables from the database. The naturalness of these ‘created’ words in speech are further improved by ‘position based syllabification ’and ‘objective spectral noise reduction’. A combination of neural and classification network and non-neural methods are used for syllabification. After new words are ‘created’, the spectral distortion present at joints is reduced with objective spectral estimation and reduction methods in time and frequency domains. These approaches result in improved naturalness for proposed Marathi—TTS.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call