Abstract

Text-to-speech (TTS) synthesizer has been proved to be an aiding tool for many visually challenged people for reading through hearing feedback. There are TTS synthesizers available in English, however, it has been observed that people feel more comfortable in hearing their own native language. Keeping this point in mind, Gujarati TTS synthesizer has been built. This TTS system has been built in Festival speech synthesis framework. Syllable is taken as the basic unit in building Gujarati TTS synthesizer as Indian languages are syllabic in nature. In building the unit-selection based Gujarati TTS system, one requires large Gujarati labeled corpus. The task of labeling is most time-consuming and tedious. This task requires large manual efforts. Therefore, in this work, an attempt has been made to reduce these efforts by automatically generating labeled corpus at syllable-level. To that effect, a Gaussian-based segmentation method has been proposed for automatic segmentation of speech at syllable-level. It has been observed that percentage correctness of labeled data is around 80% for both male and female voice as compared to 70% for group delay-based labeling. In addition, the system built on the proposed approach shows better intelligibility when evaluated by a visually challenged subject. The word error rate is reduced by 5% for Gaussian filter-based TTS system, compared to group delay-based TTS system. Also, 5% increment is observed in correctly synthesized words. The main focus of this work is to reduce the manual efforts required in building TTS system (which are primarily the manual efforts required in labeling speech data) for Gujarati.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call