Machine analysis and synthesis of spoken Telugu vowels

V.K.R Maddela

doi:10.1049/cp.2013.2577

Abstract

In this paper, the acoustic characteristics of spoken intrinsic Telugu vowels are studied. A spectrogram is built using the short-time linear prediction (LP) analysis of the vowel signal. The trajectories of the spectral peaks (formants) over time are tracked. The first three formant frequencies are estimated from these trajectories. While the analysis is straight forward for monophthong vowels, it is a bit involved in case of the diphthongs, the nasal and the conjuncts. The vowels are again synthesized using the estimated formant frequencies, formant bandwidths, formant slopes/transitions (in case of diphthongs) using a cascade formant synthesizer. As the study is mainly aimed at the vowel quality rather than the naturalness of the sound, the synthesis is carried out using three excitation sources: band limited impulse train, band limited triangular wave train and the Liljencrants-Fant (LF) glottal source model. For each vowel, each of the formant frequencies is varied over a range and the quality of the synthesized vowel is assessed subjectively. The range of each formant frequency for which the vowel color/quality is maintained, is determined. In each case, the minimum number of formants required to maintain the vowel quality are also determined. The results of this study are useful in Telugu Text-to-Speech systems and Telugu Transcription systems, Indian Music Transcription systems.

Full Text