Abstract

An algorithm for adaptive Mel frequency cepstral coefficients (MFCC) feature truncation is proposed to improve automatic speech recognition (ASR) performance under acoustically mismatched conditions. Using the relationship found between MFCC base feature truncation and degree of acoustic mismatch of speech signals with respect to recognition models, the proposed algorithm performs utterance-specific MFCC feature truncation for test signals to address their acoustic mismatch in context of ASR. The proposed technique, without any prior knowledge about the speaker of the test utterance, gives 38% (on a connected-digit recognition task) and 36% (on a continuous speech recognition task) relative improvement over baseline in ASR performance for children's speech on models trained on adult speech, which is also found to be additive to improvements obtained with vocal tract length normalization and/or constrained maximum likelihood linear regression. The generality and effectiveness of the algorithm is also validated for automatic recognition of children's and adults' speech under matched and mismatched conditions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call