Abstract

The advent of (big) data management applications operating at Cloud scale has led to extensive research on the data placement problem. The key objective of data placement is to obtain a partitioning (possibly allowing for replicas) of a set of data-items into distributed nodes that minimizes the overall network communication cost. Although replication is intrinsic to data placement, it has seldom been studied in combination with the latter. On the contrary, most of the existing solutions treat them as two independent problems, and employ a two-phase approach: (1) data placement, followed by (2) replica placement. We address this by proposing a new paradigm, CDR , with the objective of c ombining d ata and r eplica placement as a single joint optimization problem. Specifically, we study two variants of the CDR problem: (1) CDR-Single , where the objective is to minimize the communication cost alone, and (2) CDR-Multi , which performs a multi-objective optimization to also minimize traffic and storage costs. To unify data and replica placement, we propose a generic framework called UnifyDR , which leverages overlapping correlation clustering to assign a data-item to multiple nodes, thereby facilitating data and replica placement to be performed jointly. We establish the generic nature of UnifyDR by portraying its ability to address the CDR problem in two real-world use-cases, that of join-intensive online analytical processing (OLAP) queries and a location-based online social network (OSN) service. The effectiveness and scalability of UnifyDR are showcased by experiments performed on data generated using the TPC-DS benchmark and a trace of the Gowalla OSN for the OLAP queries and OSN service use-case, respectively. Empirically, the presented approach obtains an improvement of approximately 35% in terms of the evaluated metrics and a speed-up of 8 times in comparison to state-of-the-art techniques.

Highlights

  • Replication is an integral part of data placement, we identified that most of the techniques in the literature do not address the two placement steps as a single joint optimization problem, but rather treat them as two independent problems

  • We proposed two variants of the combining data and replica placement (CDR) problem: CDR-Single and CDR-Multi with applicability in addressing use-cases under two interesting real-world application domains: online analytical processing (OLAP) and online social network (OSN), respectively

  • To effectively solve the CDR problem, we proposed a generic framework, called UnifyDR, which possessed the capability to unify data and replica placement

Read more

Summary

MOTIVATION

W E live in an information age, where almost every dayto-day need of an individual is fulfilled by digitally enabled services. Even for the OSN use-case, identifying a placement of user data to minimize the inter-node migrations triggered from profile visits or user mentions reduces to an instance of the data placement problem Note that in both aforementioned applications, replication is required to ensure fault tolerance, while facilitating reduction in communication cost. We study a multi-objective optimization problem in the context of combined data and replica placement for OSN services, which is formally referred to as CDR-Multi To solve both variants of the CDR problem, we propose a generic and unified framework, called UnifyDR, which leverages overlapping correlation clustering to address data and replica placement as a joint optimization problem. VI), to showcase the effectiveness and scalability of the proposed UnifyDR framework and its associated CDR placement algorithm in solving the CDR-Single and CDR-Multi problems

RELATED WORK
COMBINED DATA AND REPLICA PLACEMENT
PRELIMINARIES
CDR-SINGLE
CDR-MULTI
UnifyDR
OVERLAPPING CORRELATION CLUSTERING
COMBINED DATA AND REPLICA PLACEMENT ALGORITHM
GREEDY CLUSTER REFINEMENT
EXPERIMENTS
3) Results
CONCLUSION AND FUTURE WORK
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call