Pushing Collaborative Data Deduplication to the Network Edge: An Optimization Framework and System Design

Shijing Li,Tian Lan,Moo-Ryong Ra,Hee Won Lee,Rajesh Krishna Panta,Bharath Balasubramanian

doi:10.1109/tnse.2022.3155357

Shijing Li, Tian Lan + Show 4 more

Open Access

https://doi.org/10.1109/tnse.2022.3155357

Copy DOI

Abstract

Edge computing has become a new computing paradigm with explosive growth in recent years. We consider the problem of pushing data deduplication to the network edge and propose a new framework for distributed edge-facilitated deduplication (EF-dedup). Deduplication at the network edge allows us to exploit the high degree of geographic- and temporal-correlation in edge data to achieve space efficiency. By leveraging distributed computing power available on the edge in a collaborative fashion, the edge nodes can effectively suppress duplicated edge data, consuming considerably less space and WAN bandwidth. To this end, we partition the edge nodes into disjoint collaborative clusters, maintain a deduplication index structure across them using a distributed key-value store and perform deduplication within those clusters. However, this partitioning problem is very challenging and requires the optimization of a novel tradeoff: edge nodes with highly correlated data may not always be within the same edge cloud, with non-trivial network cost among them. We formulate a joint storage and network optimization problem with different design objectives, such as arbitrary partitioning and balanced partitioning of edge nodes. The problem is shown to be NP-Hard in general. Then, an optimization framework with efficient algorithms is developed and is proven to achieve a closed-form competitive ratio. Our experiments, performed on edge nodes in a corporate lab <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> and a central cloud at AWS, demonstrate that EF-dedup achieves 67.4 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\sim$</tex-math></inline-formula> 133.7% better deduplication throughput than sole cloud-based techniques and achieves 20.0-62.6 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\%$</tex-math></inline-formula> lesser aggregate cost in terms of the network-storage trade-off as compared to approaches that solely favor one over the other.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Network Science and Engineering	Publication Date: Jul 1, 2022
Citations: 2	License type: publisher-specific-oa

R Discovery Prime

R Discovery Prime

Pushing Collaborative Data Deduplication to the Network Edge: An Optimization Framework and System Design

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Network Science and Engineering

Lead the way for us

Similar Papers

EF-Dedup: Enabling Collaborative Data Deduplication at the Network Edge
Shijing Li ... Bharath Balasubramanian
-
Shijing Li, et. al.Shijing Li ... Bharath Balasubramanian
01 Jul 2019
01 Jul 2019

Delay Constrained Hybrid CRAN: A Functional Split Optimization Framework
Abdulrahman Alabbasi ... Miguel Berg
-
Abdulrahman Alabbasi, et. al.Abdulrahman Alabbasi ... Miguel Berg
01 Dec 2018
01 Dec 2018

Query-driven Edge Node Selection in Distributed Learning Environments
Tahani Aladwani ... Kostas Kolomvatsos
-
Tahani Aladwani, et. al.Tahani Aladwani ... Kostas Kolomvatsos
01 Apr 2023
01 Apr 2023

CLOSED: A Cloud-Edge Dynamic Collaborative Strategy for Complex Event Detection
Jian Cao ... Shiyou Qian
-
Jian Cao, et. al.Jian Cao ... Shiyou Qian
01 Jul 2022
01 Jul 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Pushing Collaborative Data Deduplication to the Network Edge: An Optimization Framework and System Design

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Network Science and Engineering