Abstract

The Slovenian text-to-speech engine is a modular system consisting of four independent modules (text normalization, grapheme-to-phoneme conversion, prosody generation and segmental concatenation), which are pipelined together. Each module is responsible for one portion of the problem of converting from text into speech. The first two modules comprises such tasks as end-of-sentence detection, abbreviation and number expansion, special formats conversion, morphological and contextual analysis, phonological modeling. In order to generate rules for our synthesis scheme, data was collected by analysing the readings of ten speakers, five males and five females. A two-level approach has been used for duration modelling and so-called superpositional approach at pitch modelling. The system is based on the concatenation of speech units, diphones and some frequently used polyphones, using TD-PSOLA technique.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.