Abstract

This paper presents a deep neural network (DNN)-based phase reconstruction method from amplitude spectrograms. In speech processing, an amplitude spectrogram is often used for processing, and the corresponding phases are reconstructed from the amplitude spectrogram by using the Griffin-Lim method. However, the Griffin-Lim method causes unnatural artifacts in synthetic speech. To solve this problem, we propose the directional-statistics DNNs for predicting phases from the amplitude spectrograms. We first propose the von Mises distribution DNN, which is a generative model having the von Mises distribution and models histograms of a periodic variable. We extend it for modeling group delay that has a stronger connection to the amplitude spectrograms. Furthermore, we generalize the group-delay modeling and propose another DNN called the sine-skewed generalized cardioid distribution DNN for modeling asymmetric histograms such as a group delay. Results from objective and subjective evaluations indicate that (1) our von Mises distribution DNN can predict group delay more accurately than predicting phases, (2) our DNN works as better initialization of the Griffin-Lim method, (3) the phase reconstruction methods based on our von Mises distribution DNN achieve better speech quality than the conventional Griffin-Lim method, and (4) our sine-skewed generalized cardioid distribution DNN models the group delay more accurately than our von Mises distribution DNN.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.