Abstract

Multiword expressions are omnipresent element of natural language, whose construal as a linguistic resource has significant importance in various applications. This paper presents an architecture-MwTExt, for automatic extraction of multi-word terms-MWTs from such expressions within un-annotated English documents. Natural Language Processing techniques such as Shallow parsing and syntactic structure analysis are used to extract MWTs, with specific focus on lexical patterns as (Noun Preposition Noun), (Noun Preposition Noun + Noun) and (Noun Preposition Noun Preposition Noun). The MWTs extracted can be further used to form compound concepts within Ontology. The lexical descriptions of MWTs are encoded in Web Ontology Language OWL/XML. MwTExt has been tested on Computer Science domain texts, and the results obtained are compared with those obtained by Text2Onto, an Ontology learning tool and term extractors such as TermRaider and TerMine. The result signifies that MwTExt performs better for extraction of accurate lexicalized MWTs with average precision of 97%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call