Abstract

In the data backup system, to reduce the bandwidth and processing time overhead caused by full backup technology during data synchronization between backups and source data, incremental backup technology is emerging as the focus of academic and industrial research. It is key but poorly-solved to find the incremental data between backups and source data for incremental backup technology. To find out the incremental data during the backup process, here, in this paper, we propose a novel content-defined chunking algorithm. The source data and backup data are chunked into some small chunks in the same way with the variable length. Then, by comparing whether a chunk of source data is different from any of the chunks in backup data, we can evaluate whether the chunk of source data is incremental data. By experiments, the chunking algorithm in this paper is compared to other ones which are the classical or state-of-the-art algorithms. The experimental results show that the incremental data found by this algorithm can be reduced by 13%–34% compared to the others with the same chunk throughput.

Highlights

  • Chunking algorithm can avoid dealing with a whole large file by chunking the large file into several small chunks and dealing with a small chunk each time, so as to achieve the desired purpose [1], [2].A

  • By sacrificing the stability of the chunk length, we propose a novel chunking algorithm to improve the resistance against byte shifting, which will help to find less incremental data in incremental backup system

  • MINIMAL INCREMENTAL INTERVAL We propose a novel content-defined chunking algorithm Minimal Incremental Interval(MII), which is applied in incremental synchronization between files

Read more

Summary

INTRODUCTION

Chunking algorithm can avoid dealing with a whole large file by chunking the large file into several small chunks and dealing with a small chunk each time, so as to achieve the desired purpose [1], [2]. The incremental synchronization is realized by completely recording all the operations on the database in the code, and applying all the operations to VOLUME 7, 2019 the target server that needs to be synchronized. This approach is very laborious and time-consuming for programmers, and a small programming error can make the whole system broken. FileDes is divided into chunks according to the fixed length; secondly, both the strength and weakness hash values are calculated for every chunk; thirdly the checksum, which is composed of the hash values, is transmitted to the source file server; the source file server calculates both the strength and weakness hash values of the data in a sliding window with the same fixed length; compare the hash values with the ones in the checksum, find out the changed chunks, transfer the changed chunks to the target file server, save the changed chunks in the target file server and realize incremental synchronization [16]

BACKGROUND
TIME AND SPACE COMPLEXITY
EXPERIMENTS
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.