Abstract

Unit selection speech systems generate synthetic speech by concatenation of acoustic units extracted from a natural recording. Given a large speech database, the sequence of units with the best global cost is chosen by means of a Viterbi search. In this reported work, it is shown that small subcosts not related to perceptual measures can affect the sequence of units that is finally chosen, with a potential effect on the quality of synthetic speech. A segmentwise unit selection approach that minimises this effect is then proposed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call