Abstract

Short-time frequency transform (STFT) is fundamental in speech processing. Because of the difficulty of processing highly unstructured STFT phase, most speech-processing algorithms only operate with STFT magnitude, leaving the STFT phase far from explored. However, with the recent development of deep neural network (DNN) based speech processing, e.g., speech enhancement and recognition, phase processing is becoming more important than ever before as a new growing point of DNN-based methods. In this paper, we propose a phase-aware speech enhancement algorithm based on DNN. Specifically, in the training stage, when incorporating phase as a target, our core idea is to transform an unstructured phase spectrogram to its derivative along the time axis, i.e., instantaneous frequency deviation (IFD), which has a similar structure with its corresponding magnitude spectrogram. We further propose to optimize both IFD and magnitude jointly in a multiobjective learning framework. In the test stage, we propose a postprocessing method to recover the phase spectrogram from the estimated IFD. Experimental results demonstrate the effectiveness of the proposed method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.