Abstract

One of the major limitations of current NLP systems is a poor encoding of lexical knowledge (morphologic lexicon, grammar, and semantic dictionary). This paper describes a high-coverage system, DANTE, for natural language processing and query-answering. At the current state of implementation, the morphological analyzer provides 100% coverage over the corpus (5000 press agency releases with about 100,000 different words) and the parser can analyze 80% of the sentences correctly. A semantic lexicon provides a detailed case-based representation of word senses. The morphologic lexicon (10,000 elementary lemmata plus affixes and suffixes) and the grammar (100 rules) was manually entered; during the first phase of the DANTE project, the semantic knowledge was also manullly encoded. More recently, a methodology for semi-automatic acquisition of a case-based semantic lexicon has been devised.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call