Abstract

Rule based systems have achieved success in applications such as information retrieval and Natural Language Processing. However, due to the rigidity of pattern matching, these systems typically require a large number of rules to adequately cover the variations of expression in unstructured text. Consequently, knowledge engineering for a new domain and knowledge maintenance for a fielded system are labor intensive and expensive. In this paper, we present our research on enhancing a rule-based event coding system by relaxing the rigidity of pattern matching with a technique that formulates and matches patterns of the semantics of words instead of literal words. Our technique pairs literal words with semantic vectors that accumulate word meaning from the context of use of the word found in dictionaries, ontologies, and domain corpora. Our method improves the speed, accuracy, and coverage of the event coding algorithm without additional knowledge engineering effort. Operating on semantics instead of syntax, the improved system eases the workload of human analysts who screen input text for critical events. Our algorithms are based on high-dimensional distributed representations, and their effectiveness and versatility derive from the unintuitive properties of such representations---from the mathematical properties of high-dimensional spaces. Our current implementation encodes words, phrases, and rule patterns as semantic vectors using WordNet, We have started experimental evaluation using a large newswire dataset.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.