Abstract

The approach described in the paper tries to get more knowledge to the concatenative text-to-speech system design. The knowledge is based on masking phenomenon of the inner ear, particularly of its temporal (forward) masking properties. Designing such knowledge-based system is suggested to use in the unit selection-based speech synthesis, as contemporary a prominent technique in concatenative synthesis, which utilizes a big speech corpus. The more prosodic variability the corpus captures, the more natural a synthetic voice sounds and there are more possibilities to occur a forward masking events during concatenation of selected candidate units from the corpus.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call