Abstract

Perceptual evaluation of speech quality (PESQ) is widely accepted as an effective objective metric closely related to the speech quality sensed by human listening perception. Due to its evaluation complexity and non-differentiability, PESQ is difficult to include in the cost function for deep learning-based speech enhancement. In this paper, we focus on introducing PESQ to improve Deep Xi, a recently proposed minimum mean square error (MMSE) based speech enhancement with a priori signal-to-ratio (SNR) estimated by a deep neural network. Regarding discrete a priori SNR as actions, we apply reinforcement learning (RL) to select the optimal SNR at the frame level through the reward function associated with PESQ. The experimental results show that the RL-trained network is able to achieve a better PESQ score, especially in low SNR conditions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call