Abstract

In the recent years, wavelet transform has been found to be an effective tool for the time---frequency analysis for non-stationary and quasi-stationary signals such as speech signals. In the recent past, wavelet transform has been used as feature extraction in speech recognition applications. Here we propose a wavelet based feature extraction technique that signifies both the periodic and aperiodic information along with sub-band instantaneous frequency of speech signal for robust speech recognition in noisy environment. This technique is based on parallel distributed processing technique inspired by the human speech perception process. This frontend feature processing technique employs equivalent rectangular bandwidth (ERB) filter like wavelet speech feature extraction method called Wavelet ERB Sub-band based Periodicity and Aperiodicity Decomposition (WERB-SPADE), and examines its validity for TIMIT phone recognition task in noisy environments. The speech sound is filtered by 24 band ERB like wavelet filter banks, and then the equal loudness pre-emphasized output of each band is processed through comb filter. Each comb filter is designed individually for each frequency sub-band to decompose the signal into periodic and aperiodic features. Thus it takes the advantage of the robustness shown by periodic features without losing certain important information like formant transition incorporated in aperiodic features. Speech recognition experiments with a standard HMM recognizer under both clean-training and multi-training condition training is conducted. Proposed technique shows more robustness compared to other features especially in noisy condition.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call