Acoustic features for speech recognition based on Gammatone filterbank and instantaneous frequency

Hui Yin,Volker Hohmann,Climent Nadeu

doi:10.1016/j.specom.2010.04.008

Abstract

Most of the features used by modern automatic speech recognition systems, such as mel-frequency cepstral coefficients (MFCC) and perceptual linear predictive (PLP) coefficients, represent spectral envelope of the speech signal only. Nevertheless, phase or frequency modulation as represented in recent perceptual models of the peripheral auditory system might also contribute to speech decoding. Furthermore, such features can be complementary to the envelope features. This paper proposes a variety of features based on a linear auditory filterbank, the Gammatone filterbank. Envelope features are derived from the envelope of the subband filter outputs. Phase/frequency modulation is represented by the subband instantaneous frequency (IF) and is used explicitly by concatenating envelope-based and IF-based features or is used implicitly by IF-based frequency reassignment. Speech recognition experiments using a standard HMM-based recognizer under both clean training and multi-condition training are conducted on a Chinese mandarin digits corpus. The experimental results show that the proposed envelope and phase based features can improve recognition rates in clean and noisy conditions compared to the reference MFCC-based recognizer.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Acoustic features for speech recognition based on Gammatone filterbank and instantaneous frequency

Abstract

Talk to us

Similar Papers

More From: Speech Communication

Lead the way for us

Journal: Speech Communication	Publication Date: May 7, 2010
Citations: 70

Similar Papers

Effects of the Dynamic and Energy Based Feature Extraction on Hindi Speech Recognition
Shobha Bhatt ... Amita Dev
Recent Advances in Computer Science and Communications | VOL. 14
Shobha Bhatt, et. al.Shobha Bhatt ... Amita Dev
30 Aug 2021
Recent Advances in Computer Science and Communications | VOL. 14

Mel Frequency Cepstral Coefficients (MFCC) based speaker identification in noisy environment using wiener filter
Paresh M Chauhan ... Nikita P Desai
-
Paresh M Chauhan, et. al.Paresh M Chauhan ... Nikita P Desai
01 Mar 2014
01 Mar 2014

Genetic Algorithm for Combined Speaker and Speech Recognition using Deep Neural Networks
Gurpreet Kaur ... Amod Kumar
Journal of Telecommunications and Information Technology | VOL. 2
Gurpreet Kaur, et. al.Gurpreet Kaur ... Amod Kumar
29 Jun 2018
Journal of Telecommunications and Information Technology | VOL. 2

Performance Analysis of various Front-end and Back End Amalgamations for Noise-robust DNN-based ASR
Mohit Dua ... Vinam Agrawal
Recent Advances in Computer Science and Communications | VOL. 14
Mohit Dua, et. al.Mohit Dua ... Vinam Agrawal
01 Dec 2021
Recent Advances in Computer Science and Communications | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Acoustic features for speech recognition based on Gammatone filterbank and instantaneous frequency

Abstract

Talk to us

Similar Papers

More From: Speech Communication