Inter-Frequency Phase Difference for Phase Reconstruction Using Deep Neural Networks and Maximum Likelihood

Nguyen Binh Thien,Takanobu Nishiura,Kenta Iwai,Yukoh Wakabayashi

doi:10.1109/taslp.2023.3268577

Abstract

This paper presents improvements to two-stage algorithms for estimating the short-time Fourier transform (STFT) phase from only the amplitude by using deep neural networks (DNNs). The phase is difficult to reconstruct due to its sensitivity to the waveform shift and wrapping issue. To mitigate these problems, two-stage approaches indirectly estimate the phase through phase derivatives, i.e., instantaneous frequency (IF) and group delay (GD). In the first stage, the IF and GD are estimated from the amplitude using DNNs, and then in the second stage, the phase is reconstructed by maintaining the IF/GD information. Conventional methods for the second stage do not consider the importance of high-amplitude time–frequency bins, e.g., the least squares-based method, or lack a solid model, e.g., the average-based method. To address these problems, we propose improvements to the second stage of two-stage algorithms by using <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">von Mises</i> distribution-based maximum likelihood and weighted least squares. We also provide theoretical discussions for the phase reconstruction, including the investigations of the properties of the GD and roles of the IF/GD information in the inverse STFT. On the basis of the analysis, we propose a new phase-based feature, i.e., inter-frequency phase difference (IFPD), and demonstrate its application in two-stage phase reconstruction algorithms. We conducted subjective and objective experiments to compare the performances of our proposed and conventional methods. The results confirm that the proposed method using the IFPD performs better than other methods for all metrics.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Jan 1, 2023
Citations: 4	License type: CC BY-NC-ND 4.0

R Discovery Prime

R Discovery Prime

Inter-Frequency Phase Difference for Phase Reconstruction Using Deep Neural Networks and Maximum Likelihood

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Similar Papers

Weighted Inverse Short-Time Fourier Transform and Denoising Filters in the Time-Frequency Plane
Jian-Jiun Ding ... Yu-Chia Huang
-
Jian-Jiun Ding, et. al.Jian-Jiun Ding ... Yu-Chia Huang
28 Oct 2022
28 Oct 2022

Cross-spectral methods for processing speech.
Douglas J Nelson
The Journal of the Acoustical Society of America | VOL. 110
Douglas J NelsonDouglas J Nelson
01 Nov 2001
The Journal of the Acoustical Society of America | VOL. 110

Phase Reconstruction from Amplitude Spectrograms Based on Von-Mises-Distribution Deep Neural Network
Shinnosuke Takamichi ... Daichi Kitamura
-
Shinnosuke Takamichi, et. al.Shinnosuke Takamichi ... Daichi Kitamura
01 Sep 2018
01 Sep 2018

STFT-Domain Neural Speech Enhancement With Very Low Algorithmic Latency
Zhong-Qiu Wang ... Jonathan Le Roux
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 31
Zhong-Qiu Wang, et. al.Zhong-Qiu Wang ... Jonathan Le Roux
01 Jan 2023
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 31

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Inter-Frequency Phase Difference for Phase Reconstruction Using Deep Neural Networks and Maximum Likelihood

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing