Abstract

When implementing computational lexicons it is important to keep in mind the texts that a NLP system must deal with. Words relate to each other in many different, often odd ways this information is rarely found in dictionaries, and it is quite hard to deduce a priori. In this paper we present a technique for the acquisition of statistically significant selectional restrictions from corpora and discuss the results of an experimental application with reference to two specific sublaguages (legal and commercial). We show that there are important cooccurrence preferences among words which cannot be established a priori as they are determined for each choice of sublanguage. The method for detecting cooccurrences is based on the analysis of word associations augmented with syntactic markers and semantic tags. Word pairs are extracted by a morphosyntactic analyzer and clustered according to their semantic tags. A statistical measure is applied to the data to evaluate the sigificance of any relations detected. Selectional restrictions are acquired by a two-step process. First, statistically prevailing ‘coarse grained’ conceptual patterns are used by a linguist to identify the relevant selectional restrictions in sublanguages. Second, semiautomatically acquired ‘coarse’ selectional restrictions are used as the ‘semantic bias’ of a system, ARIOSTO_LEX, for the automatic acquisition of a case-based semantic lexicon.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.