Abstract

AbstractAutomatic speech recognition (ASR) system is used to recognize the text transcript from the given speech signal. Such speech signal can contain either isolated words or large vocabulary continuous speech (LVCS). Isolated words can be recognized with high accuracy in clean environment, but recognizing continuous words involves various parameters like speech corpus, speaker, environment noise, etc., that affects the accuracy of automatic speech recognition system directly. In the proposed work, hybrid feature extraction technique combines the perceptual linear predictive (PLP) and mel frequency cepstral coefficient (MFCC) to improve the accuracy of ASR in noisy environment. Voice activity and detection (VAD)-based frame dropping is used for improving the phonemes modeling by removing the pauses and distorted elements from the given speech signal. The proposed hybrid model with VAD is implemented by using self-generated speech corpus and shows relatively 12% increase in recognition rate compared with the state-of-the-art methodology.KeywordsAutomatic speech recognition (ASR)Perceptual linear predictive (PLP)Mel frequency cepstral coefficient (MFCC)Voice activity and detection (VAC)

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call