Abstract

An NLP system for Indian languages should have a lexical subsystem that is driven by a morphological analyzer. Such an analyzer should be able to parse a word into its constituent morphemes and obtain lexical projection of the word as a unification of the projections of the constituent morphemes. Lexical projections considered here aref-structures of the Lexical Functional Grammar (LFG). A formalism has been proposed, by which the lexicon writer may specify the lexicon in four levels. The specifications are compiled into a stored lexical knowledge base on one hand and a formulation of derivational morphology called Augmented Finite State Automata (AFSA) on the other to achieve a compact lexical representation. The aspects of AFSA, especially its power of morphological parsing of words in a computationally attractive manner, has been discussed. An additional utility of the AFSA, in the form of spelling error corrector, has also been discussed. Bangla, or Bengali is considered as a case study.Implementation notes based on object-oriented programming principles has been provided.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call