Auditory processing-based features for improving speech recognition in adverse acoustic conditions

Hari Krishna Maganti,Marco Matassoni

doi:10.1186/1687-4722-2014-21

Abstract

The paper describes an auditory processing-based feature extraction strategy for robust speech recognition in environments, where conventional automatic speech recognition (ASR) approaches are not successful. It incorporates a combination of gammatone filtering, modulation spectrum and non-linearity for feature extraction in the recognition chain to improve robustness, more specifically the ASR in adverse acoustic conditions. The experimental results with standard Aurora-4 large vocabulary evaluation task revealed that the proposed features provide reliable and considerable improvement in terms of robustness in different noise conditions and are comparable to those of standard feature extraction techniques.

Highlights

Present technological advances in speech processing systems aim at providing robust and reliable interfaces for practical deployment
Additive noise from interfering noise sources and convolutive noise arising from acoustic environment and transmission channel characteristics mostly contribute to the degradation of speech intelligibility as well as the performance of speech recognition systems
This article addresses the problem of achieving robustness in large vocabulary automatic speech recognition (ASR) systems by incorporating principles inspired by cochlea processing in the human auditory system

Summary

Introduction

Present technological advances in speech processing systems aim at providing robust and reliable interfaces for practical deployment. The gammatone filter bank with non-uniform bandwidths and non-uniform spacing of center frequencies provided better robustness in adverse noise conditions for speech recognition tasks [12,13,14,15]. Another important characteristic, the modulation spectrum of speech, represents low temporal modulation components and is important for speech intelligibility [16,17]. The effects of rectification, non-linearities, short-term adaptation and low-pass filtering were shown to contribute the most to robustness at low SNRs. In another study [8], the techniques motivated by human auditory processing are shown to improve the accuracy of automatic speech recognition systems.

Discrete Cosine Transform

Findings

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Eurasip Journal on Audio, Speech, and Music Processing	Publication Date: May 6, 2014
Citations: 35	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Auditory processing-based features for improving speech recognition in adverse acoustic conditions

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Eurasip Journal on Audio, Speech, and Music Processing

Lead the way for us

Similar Papers

Improving analysis techniques for automatic speech recognition
D O'Shaughnessy
-
D O'ShaughnessyD O'Shaughnessy
04 Aug 2002
04 Aug 2002

A comparative study of mel cepstra and EIH for phone classification under adverse conditions
S Sandhu ... O Ghitza
-
S Sandhu, et. al.S Sandhu ... O Ghitza
09 May 1995
09 May 1995

A Global Discriminant Joint Training Framework for Robust Speech Recognition
Lujun Li ... Tobias Watzel
-
Lujun Li, et. al.Lujun Li ... Tobias Watzel
01 Nov 2021
01 Nov 2021

A Turbo-Decoding Weighted Forward-Backward Algorithm for Multimodal Speech Recognition
Simon Receveur ... Tim Fingscheidt
-
Simon Receveur, et. al.Simon Receveur ... Tim Fingscheidt
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Auditory processing-based features for improving speech recognition in adverse acoustic conditions

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Eurasip Journal on Audio, Speech, and Music Processing