Abstract

NLFE to databases have failed in a commercial context, largely because of two reasons. Current approaches to the management of ambiguity by relying on inference over a world model create ungoing customisation requirements. Furthermore the design of NLFEs is subject to constraints which research in CL/ NLP does not address. In particular, standard parsing techniques (including robust ones) require complete lexica and cannot be deployed because data would create a constant need for dictionary update. The SQUIRREL [1] system (SERC Grant GR/E/ 69485) addresses some of these problems: its design reduces customisation effort as words are interpreted without reference to world models. The lexicon is assumed to be incomplete: unknown words are given interpretations by exploiting typing information contained in the datamodel. In addition, SQUIRREL demonstrates that NLFEs can allow for interrogation of integrity constraints, usually invisible to users. It is important to note that no new aspects of standard database management systems are involved SQUIRREl. intends to explore to what extent the state of the art in NLP/CL and Formal Semantics can be exploited in the design of NLFE to relational databases, under constraints imposed by good sofware engineering protocol. It aims to develop a modular, portable design, to plug in to public domain database technology, requiring minimal customisation. SQUIRREl. consists of a series of mappings translating NL expressions into SQL. Its highly modular design allows parts of the system to be ported without affecting other parts. Expressions in English are assigned syntactic and semantic representations on the basis of a lexicon and a context-free feature ba~d grammar. The lexicon is incomplete: unknown words are assigned tentative categories by the (bi-directional chart) parser. Syntactic and semantic rules operate in tandem. Semantic representations are cast in Property Theory (P'D [2], delivering intensional objects. These are assigned extensions in the form of first order logic (FOL) expressions. So far, the representations are independent from the domain model of any database in question. The FOL expressions are translated into the domain relational calculus (DRC), by rules exploiting the logical structure of the FOL formulae, and a domain model. The resulting expressions are translated into SQL by a simple syntactic transduction. The design offers several cut-off points at which modules can be re-deployed. The lexicon and granunar, currently written for a subset of English, can readily be customised for any language for which a context-free feature based grammar exists. The step via PT offers a second point where the system can be deployed to applications other than database interfaces. The mapping into the DRC makes it possible to port the system to any relational query language. The real advance made in this system is the economy of its datamodel. It sets out how each word in the dictionary is to be understood w.r.t, the current database by direct mapping: no world knowledge or inference is required. Unknown words are filled in by typing constraints associated with domains in the datamodel. No loss of expressiveness is entailed: this is hardly surprising as all a world model would seek to do is to (i) exaggerate ambiguity w.r.t, how a user might perceive the world, in order to (ii) reduce that ambiguity w.r.t. what the current database can provide. Under this view, step (i) is totally superfluous. The resulting gain in customisation effort is paramount. SQUIRREI.'s ambiguity management strategy is to offer users a choice between all interpretations that have survived the mapping into SQL. Note that at each stage in the mapping, alternative representations may emerge, or existing ones may die off. The most powerful disambiguation tool is the exploitation Of typing constraints associated with the database itself.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call