Abstract

This paper describes a morphological component in a speech recog nition system for German dealing with the construction of complex word form hypotheses out of a lattice of simplex forms Our example is the recognition of compounds from their individual components Evaluation results are presented for speech recognition with and without morphologically based word recognition Dieser Aufsatz beschreibt eine Morphologiekomponente in einem Spracherkennungssystem f ur das Deutsche welche die Konstruk tion von komplexen Worthypothesen aus einem W ortergitter von Simplizia am Beispiel der Erkennung von Komposita aus ihren Einzelbe standteilen behandelt Evaluationsergebnisse f ur morphologisch und nicht morphologisch basierte Worterkennung werden vorgestellt Goals and motivation This paper proposes a strategy for partially satisfying the growing demands on speech recognition systems e g large vocabulary recognition few domain restric tions robustness and unknown word recognition by integrating morphological knowledge into the speech recognition process Current stochastic word recog nizers have for example certain di culties with compound word forms Com pounds can be de ned as words which are built compositionally from other words or stems of words that can occur as free forms Examples of German compounds are Arzttermin constituents Arzt Termin Arbeitsamt constituents Arbeit Amt Wochenendtermin constituents Woche Ende Termin Compounding is a frequent phenomenon in spontaneous speech In the current VERBMOBIL transliteration corpus of wordform tokens and the related lexical database of wordform types the token frequency of compounds is the type fre quency amounts to Both compounds and their individual constituents were included in the recog nition dictionary and most of the compounds as well as their individual con stituents but in almost all their possible in ected forms occurred in the output lattice of the stochastic word recognition system cf H ubener et al A dictionary of this kind is highly redundant large dictionaries reduce the speed of the stochastic word recognition and in view of the in nite number of potential out of vocabulary compounds an exhaustive lexical listing is simply not feasible For the task of recognizing out of vocabulary words the employment of phonotactic constraints on well formed syllable structures has already been tested see e g Jusek et al Since complex words consist of units which are members of a nite set of morphs it is also possible to specify morphotactic rules which operate on this nite morph lexicon to derive complex word forms It is obvious that the set of actual morphs those which are lexicalized in a morph lexicon is only a subset of the set of potential morphs those which satisfy the phonotactic constraints Thus an integration of morphological knowledge leads to more speci c constraints on out of vocabulary complex word forms Occurrences of discontinuous split word forms are a further problem in recognizing spontaneous speech These often cannot be detected by speech recog nition systems because their phonological material is torn apart by slips of the tongue repetitions pauses or other insertions An analysis of split word forms in our corpus demonstrated that most are compounds split at morphological boundaries Although split compounds are not easily recognized by stochastic This paper was originally published in Dafydd Gibbon ed Natural Language Processing and Speech Technology Results of the rd KONVENS Conference Bielefeld October pp Berlin etc Mouton de Gruyter The Treatment of Compounds in a Morphological Component for Speech Recognition

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.