Bat-inspired dynamic features and factors that modulate their impact on speech recognition

Alexander Hsu,Kartik Audhkhasi,Jin-Ping Han,Tabassum Ahmed,Joseph Sutlive,Xiaodong Cui,Rolf Müller,Anupam Kumar Gupta

doi:10.1121/1.5068236

Abstract

One of the most serious remaining challenges in speech recognition is dealing with corruption of speech signal by other nuisance speech (babble). A promising approach to solving the problem of separating the signal of interest from the detractors is to inject direction dependent signatures into all signals received, which has been realized by bat-inspired biomimetic pinna—dynamic periphery. Changing the shape of a biomimetic pinna during the recordings introduces substantial time-variant signatures into speech signals. To investigate the utility of these signatures, we have used bioinspired signal representations (cochleagram and spikegram) as input for speech classifiers based on Gaussian mixture models (GMM) and hidden Markov models (HMM). The speech samples used were obtained from open source databases: spoken digits and alphabets from Carnegie Mellon University were mixed with babble or noise samples from Columbia University. Since the time-variant signatures were found to depend strongly on the direction of the sound source, we attempted to include datasets from different directions for training and testing to feed into the classifiers. The results indicate that dynamic periphery can substantially improve recognition and that these effects depend on the signal representation as well as the angular composition of the training dataset.One of the most serious remaining challenges in speech recognition is dealing with corruption of speech signal by other nuisance speech (babble). A promising approach to solving the problem of separating the signal of interest from the detractors is to inject direction dependent signatures into all signals received, which has been realized by bat-inspired biomimetic pinna—dynamic periphery. Changing the shape of a biomimetic pinna during the recordings introduces substantial time-variant signatures into speech signals. To investigate the utility of these signatures, we have used bioinspired signal representations (cochleagram and spikegram) as input for speech classifiers based on Gaussian mixture models (GMM) and hidden Markov models (HMM). The speech samples used were obtained from open source databases: spoken digits and alphabets from Carnegie Mellon University were mixed with babble or noise samples from Columbia University. Since the time-variant signatures were found to depend strongly on the dir...

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Bat-inspired dynamic features and factors that modulate their impact on speech recognition

Abstract

Talk to us

Similar Papers

More From: The Journal of the Acoustical Society of America

Lead the way for us

Similar Papers

HMM Mixtures (HMM2) for Robust Speech Recognition

-

01 Jan 2003
01 Jan 2003

Novel speech processing techniques for robust automatic speech recognition

-

01 Jan 2006
01 Jan 2006

Integrate template matching and statistical modeling for continuous speech recognition
Xie Sun
-
Xie SunXie Sun
01 Jan 2010
01 Jan 2010

Using Auxiliary Sources of Knowledge for Automatic Speech Recognition

-

01 Jan 2004
01 Jan 2004

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bat-inspired dynamic features and factors that modulate their impact on speech recognition

Abstract

Talk to us

Similar Papers

More From: The Journal of the Acoustical Society of America