Abstract

This paper proposes linguistic, production and prosodic constraints for modeling the intonation patterns of sequence of syllables. Linguistic constraints are represented by positional, contextual and phonological features, production constraints are represented by articulatory features, and prosodic constraints are represented by durations and intensities of syllables. Neural network models are explored to capture the implicit intonation knowledge using above mentioned features. The prediction performance of the neural network models is evaluated using objective measures such as average prediction error (μ), standard deviation (σ) and linear correlation coefficient (γX,Y).The prediction performance of the feed-forward neural network (FFNN) models is compared with other statistical models such as Classification and Regression Tree (CART) and Linear Regression (LR) models. The performance of the intonation models is also analyzed by conducting listening tests to evaluate the quality of synthesized speech after incorporating the models in baseline TTS system.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call