Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations

N Mesgarani,M Slaney,S.A Shamma

doi:10.1109/tsa.2005.858055

Abstract

We describe a content-based audio classification algorithm based on novel multiscale spectro-temporal modulation features inspired by a model of auditory cortical processing. The task explored is to discriminate speech from nonspeech consisting of animal vocalizations, music, and environmental sounds. Although this is a relatively easy task for humans, it is still difficult to automate well, especially in noisy and reverberant environments. The auditory model captures basic processes occurring from the early cochlear stages to the central cortical areas. The model generates a multidimensional spectro-temporal representation of the sound, which is then analyzed by a multilinear dimensionality reduction technique and classified by a support vector machine (SVM). Generalization of the system to signals in high level of additive noise and reverberation is evaluated and compared to two existing approaches (Scheirer and Slaney, 2002 and Kingsbury et al., 2002). The results demonstrate the advantages of the auditory model over the other two systems, especially at low signal-to-noise ratios (SNRs) and high reverberation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Audio, Speech and Language Processing

Lead the way for us

Journal: IEEE Transactions on Audio, Speech and Language Processing	Publication Date: May 1, 2006
Citations: 260

Similar Papers

Speech discrimination based on multiscale spectro-temporal modulations
N Mesgarani ... S Shamma
-
N Mesgarani, et. al.N Mesgarani ... S Shamma
17 May 2004
17 May 2004

Speech Recognition in a Noisy and Reverberant Environment with and without Earmuffs
Eeva Pekkarinena ... Jouko Suonpä
International Journal of Audiology | VOL. 29
Eeva Pekkarinena, et. al.Eeva Pekkarinena ... Jouko Suonpä
01 Jan 1990
International Journal of Audiology | VOL. 29

Triple model of auditory sensory processing: a novel gating stream directly links primary auditory areas to executive prefrontal cortex.
Sanja Josef Golubić
Acta clinica Croatica | VOL. 59
Sanja Josef GolubićSanja Josef Golubić
01 Jan 2020
Acta clinica Croatica | VOL. 59

Noise Suppression of Computed Tomography (CT) Images Using Residual Encoder-Decoder Convolutional Neural Network (RED-CNN)
H B Cokrokusumo ... D S Soejoko
Atom Indonesia | VOL. 48
H B Cokrokusumo, et. al.H B Cokrokusumo ... D S Soejoko
27 Nov 2022
Atom Indonesia | VOL. 48

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Audio, Speech and Language Processing