Abstract

Intelligibility of speech can be significantly reduced when it is presented in adverse near-end listening conditions, like background noise. Multiple approaches have been suggested to improve the perception of speech in such conditions. However, most of these approaches were designed to work with clean input speech. Therefore, they have serious limitations when deployed in real world applications like telephony and hearing aids, where noisy input speech is quite common. In this paper we present an end-to-end neural network approach for the above problem, which effectively reduces the input noise and improves the intelligibility for listeners in adverse conditions. To that end, a convolutional neural network topology with variable dilation factors is proposed and evaluated both in a causal and a non-causal configuration using raw speech as input. A Teacher-Student training strategy is employed, where the Teacher is a well-established speech-in-noise intelligibility enhancer based on spectral shaping followed by dynamic range compression (SSDRC). The evaluation is performed both objectively using the speech intelligibility in bits metric (SIIB), and subjectively on the Greek Harvard corpus. A noise robust multi-band version of SSDRC was used as a baseline. Compared with the baseline, at 0 dB input SNR, the suggested neural network system achieved about 380% and 230% relative SIIB improvements in fluctuating and stationary backgrounds, respectively. Subjectively, the suggested model increased listeners’ keyword correct rate in stationary noise from 25% to 60% at 0 dB input SNR, and from about 52% to 75% at 5 dB input SNR, compared with the baseline.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.