Abstract

A general-purpose isiZulu text-to-speech (TTS) system was developed, based on the ‘Multisyn’ unit-selection approach supported by the Festival TTS toolkit. The development involved a number of challenges related to the interface between speech technology and linguistics—for example, choosing an appropriate set of phonetic units, producing reliable pronunciations, and developing appropriate cost fonctions for selecting and joining diphone units. We show how solutions were found for each of these challenges, and describe a number of other innovations (such as automated fault detection in manual alignments) that were introduced. Initial evaluations suggest that the synthesizer is usable by a wide spectrum of isiZulu speakers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call