A complete text-to-speech system for the Slovenian language

Matjaz Gams Tomaz Sef

doi:10.5281/zenodo.37095

Abstract

The Slovenian text-to-speech engine is a modular system consisting of four independent modules (text normalization, grapheme-to-phoneme conversion, prosody generation and segmental concatenation), which are pipelined together. Each module is responsible for one portion of the problem of converting from text into speech. The first two modules comprises such tasks as end-of-sentence detection, abbreviation and number expansion, special formats conversion, morphological and contextual analysis, phonological modeling. In order to generate rules for our synthesis scheme, data was collected by analysing the readings of ten speakers, five males and five females. A two-level approach has been used for duration modelling and so-called superpositional approach at pitch modelling. The system is based on the concatenation of speech units, diphones and some frequently used polyphones, using TD-PSOLA technique.

Full Text