Abstract
Robust speech/non‐speech classification is an important step in a variety of speech processing applications. For example, in speech and speaker recognition systems designed to work in real world environments, a robust discrimination of speech from other sounds is an essential pre‐processing step. Auditory‐based features at multiple‐scales of time and spectral resolution have been shown to be very useful for the speech/non‐speech classification task [Mesgarani et al., IEEE Trans. Speech Audio Process. 10, 504–516 (2002)]. The features used are computed using a biologically inspired auditory model that maps a given sound to a high‐dimensional representation of its spectro‐temporal modulations (mimicking the various stages taking place along the auditory pathway from the periphery all the way to the primary auditory cortex). In this work, we analyze the contribution of different temporal and spectral modulations for robust speech/non‐speech classification. The results suggest the temporal modulations in the range 12–22 Hz, and spectral modulations in the range 1.5–4 cycles/octave are particularly useful to achieve the robustness in highly noisy and reverberant environments.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.