Optimizing distributed join queries: A genetic algorithm approach

Sangkyu Rho,Salvatore T March

doi:10.1023/a:1018967414664

Abstract

ptimizing join queries is a major problem in distributed database systems, particularly when files are replicated and copies stored at different nodes in the network. A distributed query optimization algorithm must select file copies and determine how and where those files will be processed. Process decisions include which files to reduce via semijoins, if any, the sites at which to perform join operations, and the order in which to perform those join operations. We extend the scope of distributed query optimization research by develop-ing a model that, for the first time, includes all of these design decisions and considers both communication and local processing costs. We develop a genetic algorithm-based solution procedure for this model which quickly determines efficient query processing plans. We demonstrate that ignoring local processing costs or restricting join processing to the result site, as commonly done in prior research, can result in inefficient query execution plans.

Full Text