Abstract
Due to the availability of larger RAM capacity, there is a new trend bringing parallel main memory database systems with higher performance, compared to traditional DBMSs. In parallel database systems the most critical aspect is data partitioning, which significantly impacts query processing time. Specifically, unbalanced data partitioning introduces data skew, which ends up decreasing query performance if not managed. In this work, we focus on optimizing range queries, widely used in P2P, decision support systems and spatio-temporal databases. We improve the communication complexity of the state-of-the-art previous algorithm based on skip graphs, which required O(log p) messages between 2 nodes to rebalance load, resulting in a high complexity O(p log p) to rebalance load on the p nodes. With such high cost in mind, we propose to create a global view of data distribution among all processing nodes and database clients. Our main contribution is the Approximate Partitioning Vector (\(\mathcal {APV}\)), which provides a global approximate view of data distribution to both processing nodes and database clients. A new data balancing algorithm, following a ring topology, reduces communication to 2 messages per node pair, resulting in O(1) communication complexity per node pair and O(p) globally among the p nodes. Experiments analyze the tradeoff between adjusting load balance and query performance.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.