Automated design of multidimensional clustering tables for relational databases

Sam S Lightstone,Bishwaranjan Bhattacharjee

doi:10.1016/b978-012088469-8.50102-9

Abstract

The ability to physically cluster a database table on multiple dimensions is a powerful technique that offers significant performance benefits in many online analytical processing (OLAP), warehousing, and decision support systems. An industrial implementation of this technique for the DB2® Universal Database™ (DB2 UDB) product, called multidimensional clustering (MDC) that co-exists with other classical forms of data storage and indexing methods, was described in VLDB 2003. This chapter describes the first published model for automating the selection of clustering keys in single-dimensional and multidimensional relational databases that use a cell/block storage structure for MDC. The automated MDC design model is based on what-if query cost modeling, data sampling, and a search algorithm for evaluating a large constellation of possible combinations. The model is effective at trading the benefits of potential combinations of clustering keys against data sparsity and performance. It also effectively selects the granularity at which dimensions should be used for clustering. The chapter presents the results from experiments indicating that the model provides design recommendations of comparable quality to those made by human experts. The model has been implemented in the IBM® DB2 UDB for Linux®, UNIX®, and Windows® Version 8.2 release.

Full Text