Abstract
Short-time frequency transform (STFT) is fundamental in speech processing. Because of the difficulty of processing highly unstructured STFT phase, most speech-processing algorithms only operate with STFT magnitude, leaving the STFT phase far from explored. However, with the recent development of deep neural network (DNN) based speech processing, e.g., speech enhancement and recognition, phase processing is becoming more important than ever before as a new growing point of DNN-based methods. In this paper, we propose a phase-aware speech enhancement algorithm based on DNN. Specifically, in the training stage, when incorporating phase as a target, our core idea is to transform an unstructured phase spectrogram to its derivative along the time axis, i.e., instantaneous frequency deviation (IFD), which has a similar structure with its corresponding magnitude spectrogram. We further propose to optimize both IFD and magnitude jointly in a multiobjective learning framework. In the test stage, we propose a postprocessing method to recover the phase spectrogram from the estimated IFD. Experimental results demonstrate the effectiveness of the proposed method.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.