Abstract

Data partitioning is an effective way to reduce cost and improve query performance in large-scale Web data analytical applications. State-of-the-art partitioning approaches on range queries is lacking of considering the correlation of data in a certain access patterns, especially in some skewed patterns. This paper presents a correlation-aware partitioning model for skewed range queries. It formulates partitioning optimization issue on continuous correlated data as a geometrical step curve fitting problem. Then, we prove that the optimal partitioning should split data on range query boundaries. On this basis, Range Boundary Based DP Partitioning is designed to induce the optimal partition and significantly reduce the computation cost compared to the baseline dynamic programming algorithm. Local is better than global. For efficiency, Bottom-up Merging Partitioning is proposed further to improve partitioning by bottom-up merging instead of searching. To evaluate the proposed approaches, sets of experiments are conducted under skewed range query workloads on skewed and uniform datasets, and show they do optimize the efficiency of data partitioning by hundreds of times.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.