Knowledge-and-Data-Driven Amplitude Spectrum Prediction for Hierarchical Neural Vocoders

Yang Ai,Zhen-Hua Ling

doi:10.21437/interspeech.2020-1046

Abstract

In our previous work, we have proposed a neural vocoder called HiNet which recovers speech waveforms by predicting amplitude and phase spectra hierarchically from input acoustic features. In HiNet, the amplitude spectrum predictor (ASP) predicts log amplitude spectra (LAS) from input acoustic features. This paper proposes a novel knowledge-and-data-driven ASP (KDD-ASP) to improve the conventional one. First, acoustic features (i.e., F0 and mel-cepstra) pass through a knowledge-driven LAS recovery module to obtain approximate LAS (ALAS). This module is designed based on the combination of STFT and source-filter theory, in which the source part and the filter part are designed based on input F0 and mel-cepstra, respectively. Then, the recovered ALAS are processed by a data-driven LAS refinement module which consists of multiple trainable convolutional layers to get the final LAS. Experimental results show that the HiNet vocoder using KDD-ASP can achieve higher quality of synthetic speech than that using conventional ASP and the WaveRNN vocoder on a text-to-speech (TTS) task.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Knowledge-and-Data-Driven Amplitude Spectrum Prediction for Hierarchical Neural Vocoders

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Denoising-and-Dereverberation Hierarchical Neural Vocoder for Robust Waveform Generation
Yang Ai ... Zhenhua Ling
-
Yang Ai, et. al.Yang Ai ... Zhenhua Ling
19 Jan 2021
19 Jan 2021

A Neural Vocoder With Hierarchical Generation of Amplitude and Phase Spectra for Statistical Parametric Speech Synthesis
Yang Ai ... Zhen-Hua Ling
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 28
Yang Ai, et. al.Yang Ai ... Zhen-Hua Ling
01 Jan 2020
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 28

Estimating attenuation and the relative information content of amplitude and phase spectra
James Rickett
GEOPHYSICS | VOL. 72
James RickettJames Rickett
01 Jan 2007
GEOPHYSICS | VOL. 72

APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase Spectra
Yang Ai ... Zhen-Hua Ling
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 31
Yang Ai, et. al.Yang Ai ... Zhen-Hua Ling
01 Jan 2023
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 31

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Knowledge-and-Data-Driven Amplitude Spectrum Prediction for Hierarchical Neural Vocoders

Abstract

Talk to us

Similar Papers