Abstract

Automatic speech recognition (ASR) is an emerging field with the goal of creating a more natural man/machine interface. The single largest obstacle to widespread use of ASR technology is robustness to noise. Since human speech recognition greatly outperforms current ASR systems in noisy environments, ASR systems seek to improve noise robustness by drawing on biological inspiration. Most ASR front ends employ mel frequency cepstral coefficients (mfcc) which is a filter bank-based algorithm whose filters are spaced on a linear-log frequency scale. Although center frequency is based on a perceptually motivated frequency scale, filter bandwidth is set by filter spacing and not through biological motivation. The coupling of filter bandwidth to other filter bank parameters (frequency range, number of filters) has led to variations of the original algorithm with different filter bandwidths. In this work, a novel extension to mfcc is introduced which decouples filter bandwidth from the rest of the filter bank parameters by employing the relationship between filter center frequency and critical bandwidth of the human auditory system. The new algorithm, called human factor cepstral coefficients (hfcc), is shown to outperform the original mfcc and two popular variations in several ASR experiments and noise sources.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.