VOICE: An integrated speech recognition synthesis system for the Hindi language

P.V.S Rao

doi:10.1016/0167-6393(93)90071-r

Abstract

A Voice Oriented Interactive Computing Environment (VOICE) has been implemented in the Hindi language. The system provides in interactive facility for visual and voice feedback. The 200 isolated word recognition system is designed around a railway reservation enquiry task and uses acoustic-phonetic segments as the basic units of recognition. Frame level classification into broad acoustic-phonetic categories is accomplished by a maximum likelihood classifier and segmentation by hierarchical clustering of the frame level likelihood vectors by use of explicit duration semi (Hidden) Markov Models. A more detailed classification of a few categories (vowels, voice bar and nasals in the first instance) is performed by neural nets. String matching using dynamic programming accomplishes lexical access, or conversion of the phonetic category symbol strings into words. Distributed processing of the word recognition task enables recognition at four times real time. A language processor disambiguates between multiple choices given by the recognizer for each word and even corrects some acoustic level recognition errors. This, the first system working in any Indian language, gives a recognition performance of 85% at the word level. For comparison, a purely HMM based word level recognizer has also been implemented. The performance is expected to improve further as there is still substantial scope for refinement.

Full Text