Abstract
One of the characteristics of the Slovenian language is that lexical stress can be located almost arbitrarily on every syllable in the word, which makes the pronunciation very difficult. Some pronunciation rules exist, but their precision is not sufficient for efficient speech synthesis. Therefore a machine-learning technique (decision trees or boosted decision trees) was applied in order to achieve better results. The paper presents a two level lexical stress assignment model for out of vocabulary Slovenian words used in our text-to-speech system. First, each vowel is determined, whether it is stressed or unstressed, and a type of lexical stress is assigned for every stressed vowel. Then, some corrections are made on the word level, according to the number of stressed vowels and the length of the word. For data sets we used the MULTEXT-East Slovene Lexicon, which was supplemented with lexical stress marks. The accuracy achieved by decision trees significantly outperforms all previous results. However, the sizes of the trees indicate that the accentuation in the Slovenian language is a very complex problem and a simple solution in the form of relatively simple rules is not possible.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.