A non-intrusive method for estimating binaural speech intelligibility from noise-corrupted signals captured by a pair of microphones

Yan Tang,Qingju Liu,Wenwu Wang,Trevor J Cox

doi:10.1016/j.specom.2017.12.005

Yan Tang, Qingju Liu + Show 2 more

Open Access

https://doi.org/10.1016/j.specom.2017.12.005

Copy DOI

Abstract

A non-intrusive method is introduced to predict binaural speech intelligibility in noise directly from signals captured using a pair of microphones. The approach combines signal processing techniques in blind source separation and localisation, with an intrusive objective intelligibility measure (OIM). Therefore, unlike classic intrusive OIMs, this method does not require a clean reference speech signal and knowing the location of the sources to operate. The proposed approach is able to estimate intelligibility in stationary and fluctuating noises, when the noise masker is presented as a point or diffused source, and is spatially separated from the target speech source on a horizontal plane. The performance of the proposed method was evaluated in two rooms. When predicting subjective intelligibility measured as word recognition rate, this method showed reasonable predictive accuracy with correlation coefficients above 0.82, which is comparable to that of a reference intrusive OIM in most of the conditions. The proposed approach offers a solution for fast binaural intelligibility prediction, and therefore has practical potential to be deployed in situations where on-site speech intelligibility is a concern.

Highlights

Objective intelligibility measures (OIMs) have been widely used in the place of subjective listening tests for speech intelligibility evaluation, due to their fast but cheap operation and the reliable feedback they provide
The two-channel mixtures are fed as the inputs into a blind-source localisation (BSL) model (Section 2.2) to calculate the approximate locations of the speech θs′ and the masker θn′, which are used to estimate the head-induced interaural level differences (ILD) of the binaural signals
Because the signals captured by the microphones do not contain head shadowing, it needs to be modelled in the binaural signals using the estimated ILD (Section 2.3) before they are passed to the intrusive OIM for intelligibility prediction

Summary

Introduction

Objective intelligibility measures (OIMs) have been widely used in the place of subjective listening tests for speech intelligibility evaluation, due to their fast but cheap operation and the reliable feedback they provide. In strictly controlled or experimental conditions, the clean speech signal is usually known and accessible, intelligibility estimation can be readily performed using an intrusive OIM In situations such as live broadcasting in public crowds, where the speech signal has already been contaminated by any non-target background sounds or the clean speech reference is not available, predicting intelligibility becomes problematic. This greatly limits the use of this class of OIMs. In contrast to intrusive OIMs, those which operate directly on noise-corrupted speech signals are known as non-intrusive OIMs

Objectives

Methods

Results

Discussion

Conclusion