Text-based sentential stress prediction using continuous lexical embedding for Mandarin speech synthesis

Yibin Zheng,Bin Liu,Ya Li,Jianhua Tao,Zhengqi Wen

doi:10.1109/iscslp.2016.7918425

Abstract

Stress is an important parameter for prosody processing in speech synthesis. However, it is not easy to stress from text analysis due to the complicated information. In this paper, we explore the novel use of the continuous lexical embedding and bidirectional long short-term memory recurrent neural network (BLSTM) model into sentential stress prediction for Mandarin speech synthesis. We look at augmenting the baseline features with word representations that are derived from text, providing continuous embedding of the lexicon in a low-dimensional space. Although learned in an unsupervised fashion, such features capture semantic and syntactic properties that make them amenable for stress prediction. We deploy various embedding models on Mandarin sentential stress prediction, showing substantial gains (relative gain gains of approximately 7.4% in F1 score).

Full Text