APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase Spectra

Yang Ai,Zhen-Hua Ling

doi:10.1109/taslp.2023.3277276

Abstract

This paper presents a novel neural vocoder named APNet which reconstructs speech waveforms from acoustic features by predicting amplitude and phase spectra directly. The APNet vocoder is composed of an amplitude spectrum predictor (ASP) and a phase spectrum predictor (PSP). The ASP is a residual convolution network which predicts frame-level log amplitude spectra from acoustic features. The PSP also adopts a residual convolution network using acoustic features as input, then passes the output of this network through two parallel linear convolution layers respectively, and finally integrates into a phase calculation formula to estimate frame-level phase spectra. Finally, the outputs of ASP and PSP are combined to reconstruct speech waveforms by inverse short-time Fourier transform (ISTFT). All operations of the ASP and PSP are performed at the frame level. We train the ASP and PSP jointly and define multilevel loss functions based on amplitude mean square error, phase anti-wrapping error, short-time spectral inconsistency error and time domain reconstruction error. Experimental results show that our proposed APNet vocoder achieves an approximately 8x faster inference speed than HiFi-GAN v1 on a CPU due to the all-frame-level operations, while its synthesized speech quality is comparable to HiFi-GAN v1. The synthesized speech quality of the APNet vocoder is also better than that of several equally efficient models. Ablation experiments also confirm that the proposed parallel phase estimation architecture is essential to phase modeling and the proposed loss functions are helpful for improving the synthesized speech quality.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase Spectra

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Jan 1, 2023
Citations: 3

Similar Papers

Object Clusters or Spectral Energy? Assessing the Relative Contributions of Image Phase and Amplitude Spectra to Trypophobia.
R Nathan Pipitone ... Christopher Dimattina
Frontiers in Psychology | VOL. 11
R Nathan Pipitone, et. al.R Nathan Pipitone ... Christopher Dimattina
24 Jul 2020
Frontiers in Psychology | VOL. 11

A Neural Vocoder With Hierarchical Generation of Amplitude and Phase Spectra for Statistical Parametric Speech Synthesis
Yang Ai ... Zhen-Hua Ling
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 28
Yang Ai, et. al.Yang Ai ... Zhen-Hua Ling
01 Jan 2020
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 28

Earthquake Source Dynamics from Farfield Amplitude and Phase Spectra of Body Waves
M Niazi
Geophysical Journal International | VOL. 37
M NiaziM Niazi
01 Apr 1974
Geophysical Journal International | VOL. 37

Machine Diagnosis Based on Amplitude-Phase Characteristics, Determined from the Experimental Amplitude Spectrum and the Calculated Phase Spectrum
Pawel Lindstedt ... Tomasz Sudakowski
-
Pawel Lindstedt, et. al.Pawel Lindstedt ... Tomasz Sudakowski
28 Apr 2020
28 Apr 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase Spectra

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing