Two-level morphology with composition

Lauri Karttunen,Annie Zaenen,Ronald M Kaplan

doi:10.3115/992066.992091

Abstract

Two-Level Morphology with Composition Lauri Karttunen, Ronald M. Kaplan, and Annie Zaenen Xerox Palo Alto Research Center Center for the Study of language and Information StanJbrd University 1. Limitations of Kimmo systems The advent of two-level morphology (Koskenniemi [1], Karttunen [2], Antworth [3], Ritchie et al. [4]) has made it relatively easy to develop adequate morphological (or at least morphographical) descriptions for natural languages, clearly superior to earlier cut-and-paste approaches to mor- phology. Most of the existing Kimmo systems developed within this paradigm consist of • linked lexicons stored as annotated letter trees • morphological information on the leaf nodes of trees • transducers that encode morphological alternations An analysis of an inflected word form is produced by mapping the input form to a sequence of lexical forms through the transducers and by composing some out- put from the annotations on the leaf nodes of the lexical paths that were traversed. Comprehensive morphological descrip- tions of this type have been developed for several languages including Finnish, Swedish, Russian, English, Swahili, and Arabic. Although they have several good features, these Kimmo-systems also have some limitations. The ones we want to ad- dress in this paper are the following: (1) Lexical representations tend to be arbitrary. Because it is difficult to write and test two-level systems that map between pairs of radically dissimilar forms, lexical representations in existing two-level analyzers tend to stay close to the surface forms. This is not a problem for morpho- logically simple languages like English because, for most words, inflected forms are very similar to the canonical dictionary entry. Except for a small number of irregular verbs and nouns, it is not difficult to create a two-level description for English in which lexical forms coincide with the canonical citation forms found in a dictionary. However, current analyzers for mor- phologically more complex languages (Finnish and Russian, for example) are not as satisfying in this respect. In these systems, lexical forms typically contain diacritic markers and special symbols; they are not real words in the language. For example, in Finnish the lexical counterpart of otin 'I took' might be rendered as otTallln, where T, al, and I1 are an arbitrary encoding of morpho- logical alternations that determine the allomorphs of the stem and the past tense morpheme. The canonical citation form ottaa 'to take' is composed from annotations on the leaf nodes of the letter trees that are linked to match the input. It is not in any direct way related to the lexical form produced by the transducers. (2) Morphological categories are not directly encoded as part of the lexical form. Instead of morphemes like Plural or Past, we typically see suffix strings like +s, and +ed, which do not by themselves indi- cate what morpheme they express. Different realizations of the same morpho- logical category are often represented as different even on the lexical side. These characteristics lead to some un- desirable consequences: ACRES DE COLING-92, NANTES, 23-28 AO~' 1992 1 4 1 PROC. OF COLING-92, NA~rr~s, AU6.23-28, 1992

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Two-level morphology with composition

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Dynamic Load Balancing Based on Constrained K-D Tree Decomposition for Parallel Particle Tracing.
Jiang Zhang ... Hanqi Guo
IEEE transactions on visualization and computer graphics | VOL. 24
Jiang Zhang, et. al.Jiang Zhang ... Hanqi Guo
29 Aug 2017
IEEE transactions on visualization and computer graphics | VOL. 24

Second language perception of English vowels by Portuguese learners: The effect of stimulus type
Anabela Alves Dos Santos Rato ... Angélica Carlet
Ilha do Desterro A Journal of English Language, Literatures in English and Cultural Studies | VOL. 73
Anabela Alves Dos Santos Rato, et. al.Anabela Alves Dos Santos Rato ... Angélica Carlet
22 Oct 2020
Ilha do Desterro A Journal of English Language, Literatures in English and Cultural Studies | VOL. 73

Rapid Interactions between Lexical Semantic and Word Form Analysis during Word Recognition in Context: Evidence from ERPs
Albert Kim ... Vicky Lai
Journal of Cognitive Neuroscience | VOL. 24
Albert Kim, et. al.Albert Kim ... Vicky Lai
01 May 2012
Journal of Cognitive Neuroscience | VOL. 24

Functional category production in English agrammatism
Jiyeon Lee ... Cynthia K Thompson
Aphasiology | VOL. 22
Jiyeon Lee, et. al.Jiyeon Lee ... Cynthia K Thompson
01 Jul 2008
Aphasiology | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Two-level morphology with composition

Abstract

Talk to us

Similar Papers