Abstract

In developing a speech recognition system, selection of a good sub-word unit is necessary since this aspect governs the accuracy and the complexity of the system. Triphone is a useful sub-word unit as it models phone-in-context, and thus captures the most important coarticulation effect. As both left and right contexts are involved, the number of such triphones would be very large. However not all contexts would show the same degree of coarticulation effect on the base phoneme. In this work, based on the analysis of coarticulation effect in consonant-vowel (CV) and vowel-consonant (VC) combinations, triphones that capture negligible coarticulation effect are replaced by the base monophone, in the triphone-based speech recognition system for Hindi. The resulting system uses a combination of monophone and triphone models and the overall number of models is drastically reduced by 32% as compared to the all-triphone system with an increment of 1% in the recognition accuracy. For the analysis of coarticulation effect, we have utilised the pole-focused linear prediction based spectrogram which gives clear formant transition information.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.