Comparative Analysis of Kannada Formant Synthesized Utterances and their Quality

Alfred Vivek D’Souza,D J Ravi

doi:10.17485/ijst/v16i5.2091

Abstract

Objectives: The goal of this work is to synthesize Kannada utterances using a modified Klatt type formant synthesizer to evaluate its performance by comparing against eSpeak synthesizer in terms of intelligibility and quality of the utterances generated. Methods: Kannada utterances viz., vowels, diphthongs, Consonant-Vowel (CV) coarticulations and simple words are generated using a modified Klatt type formant synthesizer and eSpeak. The vowels and diphthongs generated by both the synthesizers are compared with natural recorded utterances using F1-F2 formants and the CV co-articulations are compared using spectrograms. The synthesized word utterances are compared with natural recorded utterances using Log Spectral Distance to find out which synthesizer outputs the frequency spectrum that is closest to the frequency spectrum of the natural utterances. Also, the synthesized word utterances are evaluated for their intelligibility and quality using Mean Opinion Score (MOS) obtained from 10 native Kannada language speakers. Findings: The word utterances synthesized by the modified Klatt type formant synthesizer scored a MOS of 86% and 4.46 out of 5 for the parameters of intelligibility and quality whereas for the same two parameters eSpeak scored 70% and 4.14 out of 5 respectively. Novelty: Klatt type formant synthesizer that uses pitch synchronous parameter update method synthesizes good quality Kannada sound utterances and storing the control parameters of the synthesizer using polynomials reduces the database footprint. Keywords: Kannada Formant Synthesizer; Klatt type Synthesizer; eSpeak; Kannada TTS; Formant synthesis quality

Full Text