Predicting speech intelligibility with deep neural networks

Constantin Spille,Stephan D Ewert,Birger Kollmeier,Bernd T Meyer

doi:10.1016/j.csl.2017.10.004

Abstract

An accurate objective prediction of human speech intelligibility is of interest for many applications such as the evaluation of signal processing algorithms. To predict the speech recognition threshold (SRT) of normal-hearing listeners, an automatic speech recognition (ASR) system is employed that uses a deep neural network (DNN) to convert the acoustic input into phoneme predictions, which are subsequently decoded into word transcripts. ASR results are obtained with and compared to data presented in Schubotz et al. (2016), which comprises eight different additive maskers that range from speech-shaped stationary noise to a single-talker interferer and responses from eight normal-hearing subjects. The task for listeners and ASR is to identify noisy words from a German matrix sentence test in monaural conditions. Two ASR training schemes typically used in applications are considered: (A) matched training, which uses the same noise type for training and testing and (B) multi-condition training, which covers all eight maskers. For both training schemes, ASR-based predictions outperform established measures such as the extended speech intelligibility index (ESII), the multi-resolution speech envelope power spectrum model (mr-sEPSM) and others. This result is obtained with a speaker-independent model that compares the word labels of the utterance with the ASR transcript, which does not require separate noise and speech signals. The best predictions are obtained for multi-condition training with amplitude modulation features, which implies that the noise type has been seen during training. Predictions and measurements are analyzed by comparing speech recognition thresholds and individual psychometric functions to the DNN-based results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Predicting speech intelligibility with deep neural networks

Abstract

Talk to us

Similar Papers

More From: Computer Speech & Language

Lead the way for us

Journal: Computer Speech & Language	Publication Date: Oct 25, 2017
Citations: 70

Similar Papers

Evidence-based occupational hearing screening II: validation of a screening methodology using measures of functional hearing ability
Sigfrid D Soli ... Koenraad S Rhebergen
International Journal of Audiology | VOL. 57
Sigfrid D Soli, et. al.Sigfrid D Soli ... Koenraad S Rhebergen
18 Apr 2018
International Journal of Audiology | VOL. 57

Automatic Speech Recognition in Primary Progressive Apraxia of Speech.
Katerina A Tetzloff ... Rene L Utianski
Journal of speech, language, and hearing research : JSLHR | VOL. 67
Katerina A Tetzloff, et. al.Katerina A Tetzloff ... Rene L Utianski
06 Aug 2024
Journal of speech, language, and hearing research : JSLHR | VOL. 67

Temporal AM–FM combination for robust speech recognition
Yotaro Kubo ... Katsuhiko Shirai
Speech Communication | VOL. 53
Yotaro Kubo, et. al.Yotaro Kubo ... Katsuhiko Shirai
01 Sep 2010
Speech Communication | VOL. 53

Protecting Sensitive Customer Information in Call Center Recordings
Tanveer A Faruquie ... L Venkata Subramaniam
-
Tanveer A Faruquie, et. al.Tanveer A Faruquie ... L Venkata Subramaniam
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Predicting speech intelligibility with deep neural networks

Abstract

Talk to us

Similar Papers

More From: Computer Speech &amp; Language

More From: Computer Speech & Language