The Treatment of Compounds in a Morphological Component for Speech Recognition

Frederek Althoff ,Harald Lüngen ,Martina Pampel ,Guido Drexel ,Christoph Schillo

doi:10.22028/d291-25275

Abstract

This paper describes a morphological component in a speech recog nition system for German dealing with the construction of complex word form hypotheses out of a lattice of simplex forms Our example is the recognition of compounds from their individual components Evaluation results are presented for speech recognition with and without morphologically based word recognition Dieser Aufsatz beschreibt eine Morphologiekomponente in einem Spracherkennungssystem f ur das Deutsche welche die Konstruk tion von komplexen Worthypothesen aus einem W ortergitter von Simplizia am Beispiel der Erkennung von Komposita aus ihren Einzelbe standteilen behandelt Evaluationsergebnisse f ur morphologisch und nicht morphologisch basierte Worterkennung werden vorgestellt Goals and motivation This paper proposes a strategy for partially satisfying the growing demands on speech recognition systems e g large vocabulary recognition few domain restric tions robustness and unknown word recognition by integrating morphological knowledge into the speech recognition process Current stochastic word recog nizers have for example certain di culties with compound word forms Com pounds can be de ned as words which are built compositionally from other words or stems of words that can occur as free forms Examples of German compounds are Arzttermin constituents Arzt Termin Arbeitsamt constituents Arbeit Amt Wochenendtermin constituents Woche Ende Termin Compounding is a frequent phenomenon in spontaneous speech In the current VERBMOBIL transliteration corpus of wordform tokens and the related lexical database of wordform types the token frequency of compounds is the type fre quency amounts to Both compounds and their individual constituents were included in the recog nition dictionary and most of the compounds as well as their individual con stituents but in almost all their possible in ected forms occurred in the output lattice of the stochastic word recognition system cf H ubener et al A dictionary of this kind is highly redundant large dictionaries reduce the speed of the stochastic word recognition and in view of the in nite number of potential out of vocabulary compounds an exhaustive lexical listing is simply not feasible For the task of recognizing out of vocabulary words the employment of phonotactic constraints on well formed syllable structures has already been tested see e g Jusek et al Since complex words consist of units which are members of a nite set of morphs it is also possible to specify morphotactic rules which operate on this nite morph lexicon to derive complex word forms It is obvious that the set of actual morphs those which are lexicalized in a morph lexicon is only a subset of the set of potential morphs those which satisfy the phonotactic constraints Thus an integration of morphological knowledge leads to more speci c constraints on out of vocabulary complex word forms Occurrences of discontinuous split word forms are a further problem in recognizing spontaneous speech These often cannot be detected by speech recog nition systems because their phonological material is torn apart by slips of the tongue repetitions pauses or other insertions An analysis of split word forms in our corpus demonstrated that most are compounds split at morphological boundaries Although split compounds are not easily recognized by stochastic This paper was originally published in Dafydd Gibbon ed Natural Language Processing and Speech Technology Results of the rd KONVENS Conference Bielefeld October pp Berlin etc Mouton de Gruyter The Treatment of Compounds in a Morphological Component for Speech Recognition

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The Treatment of Compounds in a Morphological Component for Speech Recognition

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

The Application of Natural Language Processing and Automated Scoring in Second Language Assessment

-

22 Dec 2012
22 Dec 2012

An Empirical Investigation of Incidental Vocabulary Learning in Relation to Word Repetition and L1 Lexicalization

-

01 Jan 2008
01 Jan 2008

Multilayer Perceptron Based Hierarchical Acoustic Modeling for Automatic Speech Recognition

-

01 Jan 2009
01 Jan 2009

Grapheme-based Automatic Speech Recognition using Probabilistic Lexical Modeling

-

01 Jan 2014
01 Jan 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Treatment of Compounds in a Morphological Component for Speech Recognition

Abstract

Talk to us

Similar Papers