Abstract

The anchorage to real data is one of the main parameters that guarantees the quality and the coverage of lexical resources, especially in the context of specialized domains. Thus, lexicon extraction from corpora is a consensual method for building lexical resources. However, given that data validation by experts in specialized contexts is a necessary step, the automatic screening of data becomes fundamental to maximize the informational value of the interaction with experts. In this presentation we present and discuss a hybrid methodology, combining linguistic and statistical approaches, focusing on the extraction of specialized lexical units and salient semantic information using CQL grammars. The proposed method involves several steps, from frequency information analyses, concordances and collocations extraction to manual revision and expert validation and encompasses the construction and application of knowledge-based patterns CQL grammars. We present two CQL grammars for lexical and semantic information extraction developed for Portuguese and Italian and evaluate results from its application to specialized corpora on Public Art domain, demonstrating the value of this method for lexicon and semantic information extraction from large data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.