Abstract
Unit selection speech systems generate synthetic speech by concatenation of acoustic units extracted from a natural recording. Given a large speech database, the sequence of units with the best global cost is chosen by means of a Viterbi search. In this reported work, it is shown that small subcosts not related to perceptual measures can affect the sequence of units that is finally chosen, with a potential effect on the quality of synthetic speech. A segmentwise unit selection approach that minimises this effect is then proposed.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have