New Techniques to Enhance Data Deduplication using Content based-TTTD Chunking Algorithm

Hala Abdulsalam Jasim,Assmaa A

doi:10.14569/ijacsa.2018.090515

Abstract

Due to the fast indiscriminate increase of digital data, data reduction has acquired increasing concentration and became a popular approach in large-scale storage systems. One of the most effective approaches for data reduction is Data Deduplication technique in which the redundant data at the file or sub-file level is detected and identifies by using a hash algorithm. Data Deduplication showed that it was much more efficient than the conventional compression technique in large-scale storage systems in terms of space reduction. Two Threshold Two Divisor (TTTD) chunking algorithm is one of the popular chunking algorithm used in deduplication. This algorithm needs time and many system resources to compute its chunk boundary. This paper presents new techniques to enhance TTTD chunking algorithm using a new fingerprint function, a multi-level hashing and matching technique, new indexing technique to store the Metadata. These new techniques consist of four hashing algorithm to solve the collision problem and adding a new chunk condition to the TTTD chunking conditions in order to increase the number of the small chunks which leads to increasing the Deduplication Ratio. This enhancement improves the Deduplication Ratio produced by TTTD algorithm and reduces the system resources needed by this algorithm. The proposed algorithm is tested in terms of Deduplication Ratio, execution time, and Metadata size.

Highlights

There is an explosion on the amount of digital data in the world as manifest by the considerable growth in the measured amount of stored data in 2010 and 2011 from 1.2 zettabytes to 1.8 zettabytes 1, respectively [1], and the prophesied amount of data to be created in 2020 is 44 zettabytes [2], [3]
According to International Data Corporation (IDC) recent study, almost 80% of the surveyed corporations indicated that they are using in their storage systems to reduce redundant data kind of data deduplication technologies, which increased storage in an efficient way and reduced the costs of storage spaces [4]
Two Threshold Two Divisor (TTTD) chunking algorithm with Rabin fingerprint and SHA-1 hashing algorithm implemented in the same environment to compare the result of the proposed system with it

Summary

INTRODUCTION

There is an explosion on the amount of digital data in the world as manifest by the considerable growth in the measured amount of stored data in 2010 and 2011 from 1.2 zettabytes to 1.8 zettabytes 1 , respectively [1], and the prophesied amount of data to be created in 2020 is 44 zettabytes [2], [3]. Teng-Sheng Moh [5] in 2010 adds a new switch condition to enhance the execution time of TTTD algorithm with the same deduplication ratio. He reduced the value of the main divisor (D) and the second divisor (Ddash) to the half when the break point was not found before 1600 byte, this condition reduced about 6% of the running time and 50% of the largesized chunks. AbdulSalam and Fahad [7], in 2017 performed a survey on different chunking algorithms of data deduplication They discussed, studied the most popular chunking algorithm TTTD, and evaluated this algorithm using three different hashing functions; Rabin Finger print, Adler, and SHA1 implemented each one as a fingerprinting and hashing algorithm and compared the execution time and deduplication elimination ratio.

DATA DEDUPLICATION SYSTEM USING TTTD CHUNKING ALGORITHM

Hashing and Indexing

Matching

PROPOSED SYSTEM AND METHOD

RESULTS AND DISCUSSION

Evaluation Metrics

CONCLUSIONS AND FUTURE WORK

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Advanced Computer Science and Applications	Publication Date: Jan 1, 2018
Citations: 9	License type: cc-by

R Discovery Prime

R Discovery Prime

New Techniques to Enhance Data Deduplication using Content based-TTTD Chunking Algorithm

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Lead the way for us

Similar Papers

Cloud based industrial file handling and duplication removal using source based deduplication technique
Samer O Majed ... Sawsan K Thamer
-
Samer O Majed, et. al.Samer O Majed ... Sawsan K Thamer
01 Jan 2020
01 Jan 2020

TSS: A two‐party secure server‐aid chunking algorithm
Wenlong Tian ... Zhiyong Xu
Concurrency and computation : practice & experience | VOL. 34
Wenlong Tian, et. al.Wenlong Tian ... Zhiyong Xu
21 Sep 2021
Concurrency and computation : practice & experience | VOL. 34

Accelerating content-defined-chunking based data deduplication by exploiting parallelism
Wen Xia ... Xiangyu Zou
Future generations computer systems : FGCS | VOL. 98
Wen Xia, et. al.Wen Xia ... Xiangyu Zou
29 Mar 2019
Future generations computer systems : FGCS | VOL. 98

Data deduplication techniques for efficient cloud storage management: a systematic review
Ravneet Kaur ... Inderveer Chana
The Journal of Supercomputing | VOL. 74
Ravneet Kaur, et. al.Ravneet Kaur ... Inderveer Chana
20 Dec 2017
The Journal of Supercomputing | VOL. 74

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

New Techniques to Enhance Data Deduplication using Content based-TTTD Chunking Algorithm

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications