Abstract

This article deals with the acquisition of lexical knowledge, instrumental in complementing the ambiguous process of NLP (natural language processing). Imprecise in nature, lexical representations are mostly simple and superficial. The thesaurus would be an apt example. Two primary tools for acquiring lexical knowledge are ‘corpora’ and ‘machine-readable dictionary’ (MRD). The former are mostly domain specific, monolingual, while the definitions in MRD are generally described by a ‘genus term’ followed by a set of differentiae. Auxiliary technical nuances of the acquisition process, find mention as well, such as ‘lexical collocation’ and ‘association’, referring to the deliberate co-occurrence of words that form a new meaning altogether and loses it whenever a synonym replaces either of the words. The first seminal work on collocation extraction from large text corpora, was compiled around the early 1990s, using inter-word mutual information to locate collocation. Abundant corpus data would be obtainable from the Linguistic Data Consortium (LDC).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call