Implementing knowledge bases on secondary storage (abstract only)

Jerry D Smith

doi:10.1145/322917.323038

Abstract

In the past, many knowledge representation (KR) schemes in artificial intelligence (AI) research have assumed a primary storage-resident knowledge base. This approach to knowledge base implementation is becoming less feasible with current demands and expectations for AI-based software, in particular, large knowledge systems. The incorporation of secondary storage-resident knowledge bases into knowledge systems requires considerable rethinking with respect to knowledge structures and knowledge system design. It is the premise of this paper that there are many potential applications for knowledge systems that are based on a tightly-coupled system design, in which traditional AI KR schemes are modified to incorporate file processing techniques, e.g., hashing and signatures.Researchers in both AI and the database field are currently experimenting with a variety of approaches to “merging” AI and database technology. Essentially, there are four approaches: develop a simple interface between an AI development system, such as PROLOG, and a database system, such as INGRES, i.e., a loose-coupling;extend a database system to accommodate AI tasks, e.g., add inferencing capabilities;extend an AI development system by adding database capabilities; anddevelop a tightly-coupled system with its own AI and database capabilities.Although these four approaches to incorporating secondary storage residency into knowledge systems are very important ones, there is also much potential for incorporating specialized, file processing techniques, i.e., developing systems for applications that do not demand full database-level capabilities—a simplified approach to (4) above. For example, there may be applications that demand large knowledge bases, and therefore require the efficient use of secondary storage, but that do not require concurrent access, recovery capabilities, and so on. In these cases, we believe that knowledge systems with specialized input/output processing, tailored to a particular KR scheme, will be completely adequate and self-contained, and possibly more efficient and cost-effective. Knowledge bases implemented in primary storage on computers with large virtual storage capabilities cannot substitute for the above.There are many issues to be addressed in designing a knowledge system with secondary storage processing, based on either file or database processing techniques. One important issue concerns the lack of (traditional) primary keys during inferencing and query resolution [1]. We are currently investigating partitioning schemes with hashed partitions, where each partition contains “homogeneous” facts and/or rules and uses traditional AI KR schemes, e.g., frame-like knowledge structures, such as PROLOG structures, and tuples in PROLOG relations.Another related issue is partition size, i.e., knowledge-partition resolution and granularity. Partition size can be determined by several criteria, including available primary storage, “natural” partition size of “like” knowledge, traditional bucket size criteria for file processing, and inferencing techniques.Furthermore, the implementation language has a bearing on some of these issues. For example, with PROLOG there must be an explicit accommodation of the default backtracking technique, whereas with LISP the programmer is free to tailor the backtracking algorithm to the knowledge structures from the outset. As another example, consider the default indexing techniques used with PROLOG. Some PROLOG implementations index on the first component of a relation only [2]. Additional code must be introduced to supplement and properly override such defaults.By tightly coupling traditional KR schemes with partitioning, indexing, and hashing schemes it will be possible to develop an efficient knowledge system capable of managing a large knowledge base.

Full Text