Abstract

One of the most serious remaining challenges in speech recognition is dealing with corruption of speech signal by other nuisance speech (babble). A promising approach to solving the problem of separating the signal of interest from the detractors is to inject direction dependent signatures into all signals received, which has been realized by bat-inspired biomimetic pinna—dynamic periphery. Changing the shape of a biomimetic pinna during the recordings introduces substantial time-variant signatures into speech signals. To investigate the utility of these signatures, we have used bioinspired signal representations (cochleagram and spikegram) as input for speech classifiers based on Gaussian mixture models (GMM) and hidden Markov models (HMM). The speech samples used were obtained from open source databases: spoken digits and alphabets from Carnegie Mellon University were mixed with babble or noise samples from Columbia University. Since the time-variant signatures were found to depend strongly on the direction of the sound source, we attempted to include datasets from different directions for training and testing to feed into the classifiers. The results indicate that dynamic periphery can substantially improve recognition and that these effects depend on the signal representation as well as the angular composition of the training dataset.One of the most serious remaining challenges in speech recognition is dealing with corruption of speech signal by other nuisance speech (babble). A promising approach to solving the problem of separating the signal of interest from the detractors is to inject direction dependent signatures into all signals received, which has been realized by bat-inspired biomimetic pinna—dynamic periphery. Changing the shape of a biomimetic pinna during the recordings introduces substantial time-variant signatures into speech signals. To investigate the utility of these signatures, we have used bioinspired signal representations (cochleagram and spikegram) as input for speech classifiers based on Gaussian mixture models (GMM) and hidden Markov models (HMM). The speech samples used were obtained from open source databases: spoken digits and alphabets from Carnegie Mellon University were mixed with babble or noise samples from Columbia University. Since the time-variant signatures were found to depend strongly on the dir...

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.