An Automatically Generated Lexical Knowledge Base with Soft Definitions

Martin Scaiano

doi:10.20381/ruor-5778

Martin Scaiano

PDF Available

https://doi.org/10.20381/ruor-5778

Copy DOI

Export

Save

Cite

Publication Date: Jan 1, 2016

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

There is a need for methods that understand and represent the meaning of text for use in Artificial Intelligence (AI). This thesis demonstrates a method to automatically extract a lexical knowledge base from dictionaries for the purpose of improving machine reading. Machine reading refers to a process by which a computer processes natural language text into a representation that supports inference or inter-connection with existing knowledge (Clark and Harrison, 2010). There are a number of linguistic ideas associated with representing and applying the meaning of words which are unaddressed in current knowledge representations. This work draws heavily from the linguistic theory of frame semantics (Fillmore, 1976). A word is not a strictly defined construct; instead, it evokes our knowledge and experiences, and this information is adapted to a given context by human intelligence. This can often be seen in dictionaries, as a word may have many senses, but some are only subtle variations of the same theme or core idea. Further unaddressed issue is that sentences may have multiple reasonable and valid interpretations (or readings). This thesis postulates that there must be algorithms that work with symbolic representations which can model how words evoke knowledge and then contextualize that knowledge. I attempt to answer this previously unaddressed question, “How can a symbolic representation support multiple interpretations, evoked knowledge, soft word senses, and adaptation of meaning?” Furthermore, I implement and evaluate the proposed solution. This thesis proposes the use of a knowledge representation called Multiple Interpretation Graphs (MIGs), and a lexical knowledge structure called auto-frames to support contextualization. MIG is used to store a single auto-frame, the representation of a sentence, or an entire text. MIGs and auto-frames are produced from dependency parse trees using an algorithm I call connection search. MIG supports representing multiple different interpretations of a text, while auto-frames combine multiple word senses and information related to the word into one representation. Connection search contextualizes MIGs and auto-frames, and reduces the number of interpretations that are considered valid. In this thesis, as proof of concept and evaluation, I extracted auto-frames from Longman Dictionary of Contemporary English (LDOCE ). I take the point of view that a word’s meaning depends on what it is connected to in its definition. I do not use a The term machine reading was coined by Etzioni et al. (2006).

Full Text