Abstract

The effect of additive white Gaussian noise and high-pass filtering on speech intelligibility at signal-to-noise ratios (SNRs) from -26 to 0 dB was evaluated using British English talkers and normal hearing listeners. SNRs below -10 dB were considered as they are relevant to speech security applications. Eight objective metrics were assessed: short-time objective intelligibility (STOI), a proposed variant termed STOI+, extended short-time objective intelligibility (ESTOI), normalised covariance metric (NCM), normalised subband envelope correlation metric (NSEC), two metrics derived from the coherence speech intelligibility index (CSII), and an envelope-based regression method speech transmission index (STI). For speech and noise mixtures associated with intelligibility scores ranging from 0% to 98%, STOI+ performed at least as well as other metrics and, under some conditions, better than STOI, ESTOI, STI, NSEC, CSIIMid, and CSIIHigh. Both STOI+ and NCM were associated with relatively low prediction error and bias for intelligibility prediction at SNRs from -26 to 0 dB. STI performed least well in terms of correlation with intelligibility scores, prediction error, bias, and reliability. Logistic regression modeling demonstrated that high-pass filtering, which increases the proportion of high to low frequency energy, was detrimental to intelligibility for SNRs between -5 and -17 dB inclusive.

Highlights

  • Speech communication can be impaired in adverse conditions such as those involving interfering noise, excessive reverberation, and distortion of the transmission channel

  • Eight objective metrics were assessed: short-time objective intelligibility (STOI), a proposed variant termed STOIþ, extended short-time objective intelligibility (ESTOI), normalised covariance metric (NCM), normalised subband envelope correlation metric (NSEC), two metrics derived from the coherence speech intelligibility index (CSII), and an envelopebased regression method speech transmission index (STI)

  • The highpass filter (HPF) is detrimental to speech intelligibility for À17 < signal-to-noise ratios (SNRs) < À5 dB. These results suggest that, when speech is mixed with white Gaussian noise (WGN) at these global SNRs, the local SNR is not sufficiently improved by the HPF at higher speech frequencies, i.e., within the range of the second and third formants, to increase intelligibility for the average listener

Read more

Summary

Introduction

Speech communication can be impaired in adverse conditions such as those involving interfering noise, excessive reverberation, and distortion of the transmission channel. In the field of speech security, where there is a need to assess the risk of only a few words being intelligible when overheard or covertly intercepted, typically, the aim is to identify percentage correct word scores that are

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call