Prediction of speech intelligibility with DNN-based performance measures

Angel Mario Castro Martinez,Constantin Spille,Jana Roßbach,Birger Kollmeier,Bernd T Meyer

doi:10.1016/j.csl.2021.101329

Angel Mario Castro Martinez, Constantin Spille + Show 3 more

Open Access

https://doi.org/10.1016/j.csl.2021.101329

Copy DOI

Abstract

This paper presents a speech intelligibility model based on automatic speech recognition (ASR), combining phoneme probabilities from deep neural networks (DNN) and a performance measure that estimates the word error rate from these probabilities. This model does not require the clean speech reference nor the word labels during testing as the ASR decoding step – which finds the most likely sequence of words given phoneme posterior probabilities – is omitted. The model is evaluated via the root-mean-squared error between the predicted and observed speech reception thresholds from eight normal-hearing listeners. The recognition task consists of identifying noisy words from a German matrix sentence test. The speech material was mixed with eight noise maskers covering different modulation types, from speech-shaped stationary noise to a single-talker masker. The prediction performance is compared to five established models and an ASR-model using word labels. Two combinations of features and networks were tested. Both include temporal information either at the feature level (amplitude modulation filterbanks and a feed-forward network) or captured by the architecture (mel-spectrograms and a time-delay deep neural network, TDNN). The TDNN model is on par with the DNN while reducing the number of parameters by a factor of 37; this optimization allows parallel streams on dedicated hearing aid hardware as a forward-pass can be computed within the 10 ms of each frame. The proposed model performs almost as well as the label-based model and produces more accurate predictions than the baseline models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Prediction of speech intelligibility with DNN-based performance measures

Abstract

Talk to us

Similar Papers

More From: Computer Speech & Language

Lead the way for us

Journal: Computer Speech & Language	Publication Date: Dec 24, 2021
Citations: 7

Similar Papers

Predicting speech intelligibility with deep neural networks
Constantin Spille ... Bernd T Meyer
Computer Speech & Language | VOL. 48
Constantin Spille, et. al.Constantin Spille ... Bernd T Meyer
25 Oct 2017
Computer Speech & Language | VOL. 48

Non-Intrusive Binaural Prediction of Speech Intelligibility Based on Phoneme Classification
Jana Rosbach ... Saskia Rottges
-
Jana Rosbach, et. al.Jana Rosbach ... Saskia Rottges
06 Jun 2021
06 Jun 2021

Measurement and prediction of speech intelligibility in noise and reverberation for different sentence materials, speakers, and languages
Anna Warzybok ... Birger Kollmeier
The Journal of the Acoustical Society of America | VOL. 136
Anna Warzybok, et. al.Anna Warzybok ... Birger Kollmeier
01 Oct 2014
The Journal of the Acoustical Society of America | VOL. 136

Microscopic and Blind Prediction of Speech Intelligibility: Theory and Practice.
Mahdie Karbasi ... Dorothea Kolossa
IEEE/ACM transactions on audio, speech, and language processing | VOL. 30
Mahdie Karbasi, et. al.Mahdie Karbasi ... Dorothea Kolossa
01 Jan 2021
IEEE/ACM transactions on audio, speech, and language processing | VOL. 30

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Prediction of speech intelligibility with DNN-based performance measures

Abstract

Talk to us

Similar Papers

More From: Computer Speech &amp; Language

More From: Computer Speech & Language