Abstract

Finite-state transducers are frequently used for pronunciation lexicon representations in speech engines, in which memory and processing resources are scarce. This paper proposes two possibilities for further reducing the memory footprint of finite-state transducers representing pronunciation lexicons. First, different alignments of grapheme and allophone transcriptions are studied and a reduction in the number of states of up to 30% is reported. Second, a combination of grapheme-to-allophone rules with a finite-state transducer is proposed, which yields a 65% smaller finite-state transducer than conventional approaches.

Highlights

  • IntroductionConsistent and accurate determination of word pronunciation is critical to the success of many speech technology applications

  • An ongoing challenge in human‐robot interaction is to design efficient speech engines for natural human‐robot communication [1, 2].Consistent and accurate determination of word pronunciation is critical to the success of many speech technology applications

  • We report the results of experimenting with different alignments of grapheme and allophone transcriptions of the lexical items, changing the finite‐state transducers (FSTs) input and output alphabet

Read more

Summary

Introduction

Consistent and accurate determination of word pronunciation is critical to the success of many speech technology applications. Most state‐of‐the‐art speech engines performing automatic speech recognition (ASR) and text‐to‐speech synthesis (TTS) rely on lexicons, which contain pronunciation information for many words. To provide maximum coverage of the words, multiword expressions, or even phrases that commonly occur in a given application domain, application‐specific words, or phrase pronunciations may be required, especially for application‐specific proper nouns such as personal names or names of locations. Pronunciation lexicons for speech engines contain grapheme and allophone transcription of lexical words. The “x‐sampa‐SI‐reduced” phonetic alphabet, a subset of the X‐SAMPA set as defined for Slovenian [3], is used in allophone transcriptions.

Methods
Findings
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.