Abstract

A lexical subsystem that contains a morphological level parser is necessary for processing natural languages in general and inflectional languages in particular. Such a subsystem should be able to generate the surface form (i.e. as it appears in a natural sentence) of a word, given the sequence of morphemes constituting the word. Conversely, and more importantly, the subsystem should be able to parse a word into its constituent morphemes. A formalism which enables the lexicon writer to specify the lexicon of an inflectional language is discussed. The specifications are used to build up a lexical description in the form of a lexical database on one hand and a formulation of derivational morphology, called Augmented Finite State Automata (AFSA), on the other. A compact lexical representation has been achieved, where generation of the surface forms of a word, as well as parsing of a word is performed in a computationally attractive manner. The output produced as a result of parsing is suitable for input to the next stage of analysis in a Natural Language Processing (NLP) environment, which, in our case is based on a generalization of the Lexical Functional Grammar (LFG). The application of the formalism on inflectional Indian languages is considered, with Bengali, a modern Indian language, as a case study.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call