Abstract

A recent trend in deep neural network (DNN)-based speech enhancement consists of using intelligibility and quality metrics as loss functions for model training with the aim of achieving high subjective speech intelligibility and perceptual quality in real-life conditions. In this study, we analyze a variety of loss functions, including some based on state-of-the-art intelligibility and quality metrics, to train an end-to-end speech enhancement system based on a fully convolutional neural network. The loss functions include perceptual metric for speech quality evaluation (PMSQE), scale-invariant signal-to-distortion ratio (SI-SDR), SI-SDR integrating speech pre-emphasis, short-time objective intelligibility (STOI), extended STOI (ESTOI), spectro-temporal glimpsing index (STGI), and a composite loss function combining STGI and SI-SDR. While DNNs trained with these loss functions produce notable speech intelligibility (and quality) gains according to pertinent objective metrics, we conduct a subjective intelligibility test that contradicts this result, showing no intelligibility improvement. From the results of this study, our conclusion is twofold: (1) subjective intelligibility evaluation is currently not replaceable by objective intelligibility evaluation, and (2) both the development of meaningful intelligibility metrics and DNN-based speech enhancement systems that can consistently improve the intelligibility of noisy speech for human listening remain open problems.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.