A query processing algorithm for a system of heterogeneous distributed databases

Csaba J Egyhazy,Konstantinos P Triantis,Bharat Bhasker

doi:10.1007/bf00122148

Abstract

This paper presents a query processing algorithm, formulated and developed in support of the prototype architecture of the Distributed Access View Integrated Database (DAVID) which is a heterogeneous distributed database management system. The objective of the proposed query processing algorithm is to produce an inexpensive strategy for a given query. The inexpensive query strategy is obtained primarily by computing the most profitable semi-joins and by determining the best sequence of join operations per processing site. The latter is obtained by applying a zero-one integer linear program that uses a non-parametric statistical estimation technique to compute the sizes of the temporary clusters. A cluster is a subset of the cartesian product of a list of atomic and non-atomic domains and is the structure that can represent in a uniform way data stored in relational, hierarchical and network databases. Following some background information on the development of the DAVID prototype, this paper introduces the schema architecture. The schema architecture describes the mechanism by which the component heterogeneous database schemata are mapped into the uniform global schema. This is followed by the formulation of the query processing algorithm, its implementation and an illustration of its use in the context of NASA's Astrophysics Data System.

Full Text