Abstract

In order to assess the intelligibility of a target signal in a noisy environment, intrusive speech intelligibility metrics are typically used. They require a clean reference signal to be available which can be difficult to obtain especially for binaural metrics like the modified binaural short time objective intelligibility metric (MBSTOI). We here present a hybrid version of MBSTOI that incorporates a deep learning stage that allows the metric to be computed with only a single-channel clean reference signal. The models presented are trained on simulated data containing target speech, localised noise, diffuse noise, and reverberation. The hybrid output metrics are then compared directly to MBSTOI to assess performances. Results show the performance of our single channel reference vs MBSTOI. The outcome of this work offers a fast and flexible way to generate audio data for machine learning (ML) and highlights the potential for low level implementation of ML into existing tools.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call