Abstract

The paper describes a corpus-based approach applied in the evolution of ELOQUENS, the CSELT text-to-speech synthesis system for Italian, towards multi-voice, multilanguage, high-naturalness concatenative synthesis. The acoustic modules have been redesigned, according to the idea of reducing the number of junctions and the need of prosodic modification. Appropriate phonetic coverage methods were applied in the acoustic database design. Automatic processing tools performed phone and diphone segmentation, pitch marking, prosodic feature detection. The synthesis algorithm exploits the speech material at its best, searching for the longest suitable sequences in the database, according to weighted distance measures on phonetic/prosodic parameters. Signal modification techniques are applied only if necessary, to smooth residual prosodic jumps at unit boundaries. The resulting voice is quite human-sounding. Keyword: corpus-based concatenative synthesis

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call