Abstract

The performance of the speech recognition system depends on the local conditions within an utterance for each speaker. It is important to capture this local variation within an utterance. In order to capture this information and dynamic changes within an utterance using CDHMMs, we propose a novel approach where the features of each utterance are extracted using multiple frame sizes and multiple frame rates where models are trained with these features using CDHMM. The performance of the recognition system using multiple frame size (MFS) and multiple frame rate (MFR) feature extraction is compared with that of the single frame size, where the window size and frame rate are fixed. Using this approach, for a gender dependent speech recognition system, there is an observable improvement in the performance of 4 % over the recognition system using single frame size feature extraction.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call