Improving Statistical Model-Based Speech Enhancement with Deep Neural Networks

Bengt J Borgstrom,Michael S Brandstein,Robert B Dunn

doi:10.1109/iwaenc.2018.8521382

Abstract

This paper presents a framework for improving the performance of statistical model-based single-channel speech enhancement systems by using a deep neural network (DNN). A DNN is trained to predict speech presence in the input signal, and this information is leveraged to design novel methods for noise tracking and a priori signal-to-noise ratio (SNR) estimation, which remain the most challenging tasks in conventional systems. The proposed framework provides increased flexibility for various aspects of system design such as gain estimation, relative to end-to-end DNN-based systems. Additionally, the DNN can be trained to detect speech in the presence of both noise and reverberation, leading to joint suppression of additive noise and reverberation. The proposed framework provides significant improvements in objective speech quality metrics relative to baseline systems, and the proposed system was heavily favored in a subjective preference test.

Full Text