Cost Effective Load-Balancing Approach for Range-Partitioned Main-Memory Resident Data

Djahida Belayadi,Khaled-Walid Hidouci,Carlos Ordonez,Ladjel Bellatreche

doi:10.1007/978-3-319-98812-2_20

Abstract

Due to the availability of larger RAM capacity, there is a new trend bringing parallel main memory database systems with higher performance, compared to traditional DBMSs. In parallel database systems the most critical aspect is data partitioning, which significantly impacts query processing time. Specifically, unbalanced data partitioning introduces data skew, which ends up decreasing query performance if not managed. In this work, we focus on optimizing range queries, widely used in P2P, decision support systems and spatio-temporal databases. We improve the communication complexity of the state-of-the-art previous algorithm based on skip graphs, which required O(log p) messages between 2 nodes to rebalance load, resulting in a high complexity O(p log p) to rebalance load on the p nodes. With such high cost in mind, we propose to create a global view of data distribution among all processing nodes and database clients. Our main contribution is the Approximate Partitioning Vector (\(\mathcal {APV}\)), which provides a global approximate view of data distribution to both processing nodes and database clients. A new data balancing algorithm, following a ring topology, reduces communication to 2 messages per node pair, resulting in O(1) communication complexity per node pair and O(p) globally among the p nodes. Experiments analyze the tradeoff between adjusting load balance and query performance.

Full Text