IRM with Phase Parameterization for Speech Enhancement

Xianyun Wang,Changchun Bao,Rui Cheng

doi:10.1109/waspaa.2019.8937085

Abstract

Deep neural network (DNN) has become a popular means for separating target speech from noisy speech in the supervised speech enhancement due to its good performance for learning higher-level information. For DNN-based methods, the training target and acoustic features have a significant impact on the performance of speech restoration. The ideal ratio mask (IRM) is commonly used as the training target. But, generally it does not take into account phase information. The recent studies have revealed that incorporating phase information into the mask can effectively help improve speech quality of the enhanced speech. In this paper, a bounded IRM with phase parameterization is presented and used as the training target of the DNN model. In addition, some acoustic features with harmonic preservation are incorporated into the input of DNN model, which are considered as additional information to improve quality of the enhanced speech. The experiments are performed under various noise environments and signal to noise ratio (SNR) conditions. The results show that the proposed method can outperform reference methods.

Full Text