A Sequence-to-Sequence Pronunciation Model for Bangla Speech Synthesis

Arif Ahmad,Mohammad Reza Selim,Mohammed Raihan Hussain,Mohammad Shahidur Rahman,Muhammed Zafar Iqbal

doi:10.1109/icbslp.2018.8554507

Abstract

Extracting pronunciation from written text is necessary in many application areas, especially in text-to-speech synthesis. ‘Bangla’ is not completely a phonetic language, meaning there is not always direct mapping from orthography to pronunciation. It mainly suffers from ‘schwa deletion’ problem, along with some other ambiguous letters and conjuncts. Rule-based approaches cannot completely solve this problem. In this paper, we propose to adopt an Encoder-Decoder based neural machine translation (NMT) model for determining pronunciations of Bangla words. We mapped the pronunciation problem into a sequence-to-sequence problem and used two ‘Gated Recurrent Unit Recurrent Neural Network's (GRU-RNNs) for our model. We fed the model with two types of input data. In one model we used ‘raw’ words and in other model we used ‘pre-processed’ words (normalized by hand-written rules) as input. Both experiments showed promising results and can be used in any practical application.

Full Text