Abstract

A major determinant of success for separating a speech signal from a noisy environment is the intelligibility of the extracted speech signal. Intelligibility is best measured as the fraction of words correctly recognized by listeners, but computational measures are often preferred because they are less labor-intensive than collecting listener judgements. We compare listener intelligibility data to three acoustically derived measures: (a) SNRs estimated from the processed mixtures, (b) coherence and (c) speech-based Speech Transmission Index (sSTI). Sentences were recorded against restaurant babble, white Gaussian noise, and nonstationary noise by four microphones at different SNRs ranging from +4 dB to −8 dB. Processing conditions included (1) the original mixture; (2) the mixture processed by a critically determined 4-channel blind source separation (BSS) algorithm; (3) the mixture processed by a 2-channel, underdetermined, BSS algorithm; (4) the residual after subtracting noise estimates determined using a least-mean squared (LMS) algorithm; (5) an estimate of speech extracted using an LMS algorithm to remove the two noises and then 2-channel BSS to separate the sentences from the babble; and (6) pristine speech recorded with no noise. The computational measures are compared to gold-standard intelligibility results from listening tests.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call