Abstract

Stress is an important parameter for prosody processing in speech synthesis. However, it is not easy to stress from text analysis due to the complicated information. In this paper, we explore the novel use of the continuous lexical embedding and bidirectional long short-term memory recurrent neural network (BLSTM) model into sentential stress prediction for Mandarin speech synthesis. We look at augmenting the baseline features with word representations that are derived from text, providing continuous embedding of the lexicon in a low-dimensional space. Although learned in an unsupervised fashion, such features capture semantic and syntactic properties that make them amenable for stress prediction. We deploy various embedding models on Mandarin sentential stress prediction, showing substantial gains (relative gain gains of approximately 7.4% in F1 score).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call