Phonemic transcription by analogy in text-to-speech synthesis: Novel word pronunciation and lexicon compression

Paul C Bagshaw

doi:10.1006/csla.1998.0042

Abstract

The synthesis of speech from unrestricted text needs a phonemic transcription including syllabification and lexical stress for each word and symbol. Speech synthesizers currently use large lexicons to provide such transcriptions, but noteveryword has a lexical entry and a backup is required to produce transcriptions for novel words. In addition, synthesizers do not have an infinite amount of memory at their disposal, so it is not always possible continually to append supplementary lexemes for specialized applications in the hope of reducing the probability of encountering a novel word. Transcriptions for novel words are produced by implicit analogy with an existing lexicon. A data-driven technique of extracting context-dependent grapheme-to-phoneme rules with dynamically minimized context lengths from a training lexicon is proposed. Syllable boundary and lexical stress information is included in the transcriptions. The proposed system satisfies certain pragmatic constraints: it can produce transcriptions with sufficient rapidity to maintain real-time processing in a text-to-speech system; the rules take up a small amount of storage size (370 KBytes); and a pronunciation can be generated for any novel word. The quality of the transcription process enables 77·06% of lexemes formerly present in the training lexicon to be excluded, thus reducing the lexicon's memory requirements by 74·18% (of 3·57 MBytes).

Full Text