Reducing the complexity of a triphone-based speech recognition system based on degree of coarticulation

P Vijayalakshmi,B Bharathi,T Nagarajan,A Abraham

doi:10.1109/techsym.2011.5783820

Abstract

In developing a speech recognition system, selection of a good sub-word unit is necessary since this aspect governs the accuracy and the complexity of the system. Triphone is a useful sub-word unit as it models phone-in-context, and thus captures the most important coarticulation effect. As both left and right contexts are involved, the number of such triphones would be very large. However not all contexts would show the same degree of coarticulation effect on the base phoneme. In this work, based on the analysis of coarticulation effect in consonant-vowel (CV) and vowel-consonant (VC) combinations, triphones that capture negligible coarticulation effect are replaced by the base monophone, in the triphone-based speech recognition system for Hindi. The resulting system uses a combination of monophone and triphone models and the overall number of models is drastically reduced by 32% as compared to the all-triphone system with an increment of 1% in the recognition accuracy. For the analysis of coarticulation effect, we have utilised the pole-focused linear prediction based spectrogram which gives clear formant transition information.

Full Text