Using a single-channel reference with the MBSTOI binaural intelligibility metric

Pierre Guiraud,Alastair H Moore,Rebecca R Vos,Patrick A Naylor,Mike Brookes

doi:10.1016/j.specom.2023.03.005

Pierre Guiraud, Alastair H Moore + Show 3 more

Open Access

https://doi.org/10.1016/j.specom.2023.03.005

Copy DOI

Journal: Speech Communication	Publication Date: Mar 11, 2023
License type: cc-by

Affiliation: Imperial College London

Abstract

In order to assess the intelligibility of a target signal in a noisy environment, intrusive speech intelligibility metrics are typically used. They require a clean reference signal to be available which can be difficult to obtain especially for binaural metrics like the modified binaural short time objective intelligibility metric (MBSTOI). We here present a hybrid version of MBSTOI that incorporates a deep learning stage that allows the metric to be computed with only a single-channel clean reference signal. The models presented are trained on simulated data containing target speech, localised noise, diffuse noise, and reverberation. The hybrid output metrics are then compared directly to MBSTOI to assess performances. Results show the performance of our single channel reference vs MBSTOI. The outcome of this work offers a fast and flexible way to generate audio data for machine learning (ML) and highlights the potential for low level implementation of ML into existing tools.

Full Text