MII: A Novel Content Defined Chunking Algorithm for Finding Incremental Data in Data Synchronization

Changjian Zhang,Jing Guo,Xinyang Wang,Wenhao Huang,Zhe Cai,Wenlin Li,Deyu Qi

doi:10.1109/access.2019.2926195

Changjian Zhang, Jing Guo + Show 5 more

Open Access

https://doi.org/10.1109/access.2019.2926195

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 43	License type: CC BY 4.0

Affiliation: South China University of Technology

Abstract

In the data backup system, to reduce the bandwidth and processing time overhead caused by full backup technology during data synchronization between backups and source data, incremental backup technology is emerging as the focus of academic and industrial research. It is key but poorly-solved to find the incremental data between backups and source data for incremental backup technology. To find out the incremental data during the backup process, here, in this paper, we propose a novel content-defined chunking algorithm. The source data and backup data are chunked into some small chunks in the same way with the variable length. Then, by comparing whether a chunk of source data is different from any of the chunks in backup data, we can evaluate whether the chunk of source data is incremental data. By experiments, the chunking algorithm in this paper is compared to other ones which are the classical or state-of-the-art algorithms. The experimental results show that the incremental data found by this algorithm can be reduced by 13%–34% compared to the others with the same chunk throughput.

Highlights

Chunking algorithm can avoid dealing with a whole large file by chunking the large file into several small chunks and dealing with a small chunk each time, so as to achieve the desired purpose [1], [2].A
By sacrificing the stability of the chunk length, we propose a novel chunking algorithm to improve the resistance against byte shifting, which will help to find less incremental data in incremental backup system
MINIMAL INCREMENTAL INTERVAL We propose a novel content-defined chunking algorithm Minimal Incremental Interval(MII), which is applied in incremental synchronization between files

Summary

INTRODUCTION

Chunking algorithm can avoid dealing with a whole large file by chunking the large file into several small chunks and dealing with a small chunk each time, so as to achieve the desired purpose [1], [2]. The incremental synchronization is realized by completely recording all the operations on the database in the code, and applying all the operations to VOLUME 7, 2019 the target server that needs to be synchronized. This approach is very laborious and time-consuming for programmers, and a small programming error can make the whole system broken. FileDes is divided into chunks according to the fixed length; secondly, both the strength and weakness hash values are calculated for every chunk; thirdly the checksum, which is composed of the hash values, is transmitted to the source file server; the source file server calculates both the strength and weakness hash values of the data in a sliding window with the same fixed length; compare the hash values with the ones in the checksum, find out the changed chunks, transfer the changed chunks to the target file server, save the changed chunks in the target file server and realize incremental synchronization [16]

BACKGROUND

TIME AND SPACE COMPLEXITY

EXPERIMENTS

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MII: A Novel Content Defined Chunking Algorithm for Finding Incremental Data in Data Synchronization

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Function of Content Defined Chunking Algorithms in Incremental Synchronization
Changjian Zhang ... Jing Guo
IEEE Access | VOL. 8
Changjian Zhang, et. al.Changjian Zhang ... Jing Guo
01 Jan 2020
IEEE Access | VOL. 8

An Effective Way To Reduce Network Transmission In Backup System
Yun Chao ... Jindian Su
-
Yun Chao, et. al.Yun Chao ... Jindian Su
01 Jun 2022
01 Jun 2022

A Fast Asymmetric Extremum Content Defined Chunking Algorithm for Data Deduplication in Backup Storage Systems
Yucheng Zhang ... Fangting Huang
IEEE Transactions on Computers | VOL. 66
Yucheng Zhang, et. al.Yucheng Zhang ... Fangting Huang
01 Jan 2015
IEEE Transactions on Computers | VOL. 66

Frequency Based Chunking for Data De-Duplication
Guanlin Lu ... Yu Jin
-
Guanlin Lu, et. al.Guanlin Lu ... Yu Jin
01 Aug 2010
01 Aug 2010

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MII: A Novel Content Defined Chunking Algorithm for Finding Incremental Data in Data Synchronization

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access