Abstract

This paper seeks to investigate the usability of two fully-automatic machine-extracted linguistic features from an unlimited text input, in a prosody generation of Mandarin text-to-speech system (MTTS). One is the base-phrase chunk feature, labeled by a conditional random field (CRF)-based base-phrase chunker. Another is the punctuation confidence (PC), calculated for each lexical word (LW) boundary from input text tagged with Chinese word boundaries, part of speech (POS) and base-phrase chunk, measuring the likelihood of inserting a punctuation mark (PM) at a word boundary. Owing to the fact that a PM in text is highly correlated with a prosodic break, and base-phrases play an important role in human language understanding, the two features potentially could provide useful information for prosody generation. To examine potential usefulness of the proposed linguistic features, the performances of neural network-based prosody generator - with and without the proposed features - were evaluated. Both objective and subjective tests showed that the prosody generator with the proposed linguistic features performed better than the one without the proposed features. So the proposed PC and base-phrase chunking information are promising features for Mandarin prosody generation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call