Exploiting acoustic similarities between Tamil and Indian English in the development of an HMM‐based bilingual synthesiser

Sherlin Solomi Vijayarajsolomon,Vijayalakshmi Parthasarathy,Nagarajan Thangavelu

doi:10.1049/iet-spr.2016.0163

Abstract

In this study, an efficient hidden Markov model (HMM)-based bilingual speech synthesiser for the Indian language Tamil and Indian English is developed. Initially, phone mapping approach is tried to synthesise English text using Tamil corpus alone by mapping English phonemes to the perceptually similar phonemes in Tamil, and is found that the approach is language-dependent and requires a large dictionary for Indian pronunciation. Therefore, given the speech data for both languages, the straight-forward approach to develop a bilingual synthesiser is to build separate synthesiser for each language and combine them or by merging the perceptually similar phonemes. These approaches introduce language-switching/influence in the synthesised speech. To minimise switching and influence, HMM-based bilingual synthesisers are developed by merging acoustically similar phonemes, derived based on model parameters and likelihood Gaussian using various distance metrics. The performance of these synthesisers are evaluated based on mean opinion score (MOS), language-switching and language-influence. Results reveal that the set of phonemes derived using product-of-likelihood Gaussians in the likelihood space is the optimum set of phonemes that can be merged and the system developed by merging these phonemes outperforms the rest with an MOS of 3.66. Furthermore, only 8% and 23% of the sentences have language-switching/influence, respectively.

Full Text