Abstract

This paper presents a low-complexity, effective variable frame rate (VFR) analysis method that conducts frame selection on the basis of a posteriori signal-to-noise ratio (SNR) weighted energy distance. It has two characteristics. First, energy distance (instead of cepstral distance) is used to make it computationally efficient and thus enable a finer granularity in search as compared with cepstral distance criterion. Secondly, SNR weighting is used to emphasize the reliable regions in noisy speech signals. In terms of frame selection, it is experimentally found that the method is able to assign a higher frame rate to fast changing events such as consonants, a lower frame rate to steady regions like vowels and no frames to silence, even for very low SNR signals. The VFR method is applied to speech recognition in noisy environments to improve noise robustness. Being a method that takes effect in the time-domain, it is moreover combined with spectral- and cepstral-domain techniques to gain further improvement. Experiments are conducted on the Aurora 2 database, which is the TI digits database artificially distorted by adding different noises, and very encouraging results are obtained.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call