On the File Design Problem for Partial Match Retrieval

Hung-Chang Du Hung-Chang Du

doi:10.1109/tse.1985.232197

Abstract

In the past two decades, the increasing usage of databases and integrated information systems has encouraged the development of file structures suited for partial match retrieval. A partial match query is a query with some number of attributes specified and the rest of them unspecified. One interesting file structure proposed and heavily studied recently is called a multikey hashing scheme, but most of the previous results on designing optimal multikey hashing schemes ignored the record distribution of a file. In this paper we show that the problem of designing an optimal multikey hashing scheme taking into consideration the record distribution is computationally intractable (NP-hard). Therefore, a heuristic approach is necessary. In a multikey hashing scheme, although the directory is space efficient and the search algorithm is fast, due to the insufficient information in the directory some accessed buckets may not contain any record satisfying the given query. Thus, certain retrieval effort is wasted. A new class of file structures which combine a multikey hashing scheme and an indexed descriptor technique is introduced in this paper. By adding some extra information (either record descriptors or bucket descriptors) into the directory of a multikey hashing scheme, either only those buckets which contain at least one record satisfying the given query need to be accessed or the number of accessed buckets which do not contain any record satisfying the query is reduced.

Full Text