Abstract

This paper presents methodologies involved in text normalization and diphone preparation for Bangla Text to Speech (TTS) synthesis. A Concatenation based TTS system comprises basically two modules- one is natural language processing and the other is Digital Signal Processing (DSP). Natural language processing deals with converting text to its pronounceable form, called Text Normalization and the diphone selection method based on the normalized text is called Grapheme to Phoneme (G2P) conversion. Text normalization issues addressed in this paper include tokenization, conjuncts, null modified characters, numerical words, abbreviations and acronyms. Issues related with diphone preparation include diphone categorization, corpus preparation, diphone labeling and diphone selection. Appropriate rules and algorithms are proposed to tackle all the above mentioned issues. We developed a speech synthesizer for Bangla using diphone based concatenative approach which is demonstrated to produce much natural sounding synthetic speech.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call