Abstract

In the era of "big data", the emergence and increasing adoptions of the related enabling technologies make it possible for Map-Reduce to accommodate DSS (Decision Support Systems) load, which is commonly targeted for high-performance Data Warehouse analyses in the context of RDBMS. However, the non-predetermined mapping of the Map-Reduce tasks to the physical machines makes it difficult to utilize the pre-partitioned and indexing techniques of DBMS to improve the data locality. In this paper, towards multi-way join evaluating OLAP (Online Analysis Processing) workloads, we introduce table partitioning by reference to Map-Reduce. For avoiding the dispersion of the initial tuples that belong to the same segment keys, we present a detailed description of the data organization model that partitions the dominated tables by cascade reference constraints. In order to push multiple joins on these clustered partitions down to the map task, we design a one-pass multi-way join algorithm along with its optimization implementations for the major Map-Reduce stages. We conduct an empirically study with TPCH benchmark on different scales of clusters, and experimentally verify the high efficiency of the proposed optimization model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call