Unit Selection Speech Synthesis Using Frame-Sized Speech Segments and Neural Network Based Acoustic Models

Zhen-Hua Ling,Zhi-Ping Zhou

doi:10.1007/s11265-018-1336-0

Abstract

This paper proposes to select frame-sized speech segments for waveform concatenation speech synthesis using neural network based acoustic models. First, a deep neural network (DNN) based frame selection method is presented. In this method, three DNNs are adopted to calculate target costs and concatenation costs respectively for selecting candidate frames of 5ms length. One DNN is built in the same way as the DNN-based statistical parametric speech synthesis, which predicts target acoustic features given linguistic context inputs. The distance between the acoustic features of a candidate unit and the predicted ones for a target unit is calculated as the target cost. The other two DNNs are constructed to predict the acoustic features at current frame using its context features and the acoustic features of preceding frames. At synthesis time, these two DNNs are employed to calculate the concatenation cost for each candidate frame given its preceding ones. Furthermore, recurrent neural networks (RNNs) with long short-term memory (LSTM) cells are adopted to replace DNNs for acoustic modeling in order to make better use of the sequential information. A strategy of using multi-frame instead of single frame as the basic unit for selection is also presented to reduce the concatenation points within synthetic speech. Experimental results show that our proposed method can achieve better naturalness than the hidden Markov model (HMM)-based frame selection method and the HMM-based parametric speech synthesis method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Unit Selection Speech Synthesis Using Frame-Sized Speech Segments and Neural Network Based Acoustic Models

Abstract

Talk to us

Similar Papers

More From: Journal of Signal Processing Systems

Lead the way for us

Journal: Journal of Signal Processing Systems	Publication Date: Feb 13, 2018
Citations: 6

Similar Papers

DNN-based unit selection using frame-sized speech segments
Zhi-Ping Zhou ... Zhen-Hua Ling
-
Zhi-Ping Zhou, et. al.Zhi-Ping Zhou ... Zhen-Hua Ling
01 Oct 2016
01 Oct 2016

A study of speaker adaptation for DNN-based speech synthesis
...
-
, et. al. ...
08 Dec 2015
08 Dec 2015

Deep Elman recurrent neural networks for statistical parametric speech synthesis
Sivanand Achanta ... Suryakanth V Gangashetty
Speech Communication | VOL. 93
Sivanand Achanta, et. al.Sivanand Achanta ... Suryakanth V Gangashetty
03 Aug 2017
Speech Communication | VOL. 93

Bottleneck and Embedding Representation of Speech for DNN-based Language and Speaker Recognition
Alicia Lozano-Diez ... Javier Gonzalez-Dominguez
-
Alicia Lozano-Diez, et. al.Alicia Lozano-Diez ... Javier Gonzalez-Dominguez
21 Nov 2018
21 Nov 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Unit Selection Speech Synthesis Using Frame-Sized Speech Segments and Neural Network Based Acoustic Models

Abstract

Talk to us

Similar Papers

More From: Journal of Signal Processing Systems