Abstract

Due to the fast indiscriminate increase of digital data, data reduction has acquired increasing concentration and became a popular approach in large-scale storage systems. One of the most effective approaches for data reduction is Data Deduplication technique in which the redundant data at the file or sub-file level is detected and identifies by using a hash algorithm. Data Deduplication showed that it was much more efficient than the conventional compression technique in large-scale storage systems in terms of space reduction. Two Threshold Two Divisor (TTTD) chunking algorithm is one of the popular chunking algorithm used in deduplication. This algorithm needs time and many system resources to compute its chunk boundary. This paper presents new techniques to enhance TTTD chunking algorithm using a new fingerprint function, a multi-level hashing and matching technique, new indexing technique to store the Metadata. These new techniques consist of four hashing algorithm to solve the collision problem and adding a new chunk condition to the TTTD chunking conditions in order to increase the number of the small chunks which leads to increasing the Deduplication Ratio. This enhancement improves the Deduplication Ratio produced by TTTD algorithm and reduces the system resources needed by this algorithm. The proposed algorithm is tested in terms of Deduplication Ratio, execution time, and Metadata size.

Highlights

  • There is an explosion on the amount of digital data in the world as manifest by the considerable growth in the measured amount of stored data in 2010 and 2011 from 1.2 zettabytes to 1.8 zettabytes 1, respectively [1], and the prophesied amount of data to be created in 2020 is 44 zettabytes [2], [3]

  • According to International Data Corporation (IDC) recent study, almost 80% of the surveyed corporations indicated that they are using in their storage systems to reduce redundant data kind of data deduplication technologies, which increased storage in an efficient way and reduced the costs of storage spaces [4]

  • Two Threshold Two Divisor (TTTD) chunking algorithm with Rabin fingerprint and SHA-1 hashing algorithm implemented in the same environment to compare the result of the proposed system with it

Read more

Summary

INTRODUCTION

There is an explosion on the amount of digital data in the world as manifest by the considerable growth in the measured amount of stored data in 2010 and 2011 from 1.2 zettabytes to 1.8 zettabytes 1 , respectively [1], and the prophesied amount of data to be created in 2020 is 44 zettabytes [2], [3]. Teng-Sheng Moh [5] in 2010 adds a new switch condition to enhance the execution time of TTTD algorithm with the same deduplication ratio. He reduced the value of the main divisor (D) and the second divisor (Ddash) to the half when the break point was not found before 1600 byte, this condition reduced about 6% of the running time and 50% of the largesized chunks. AbdulSalam and Fahad [7], in 2017 performed a survey on different chunking algorithms of data deduplication They discussed, studied the most popular chunking algorithm TTTD, and evaluated this algorithm using three different hashing functions; Rabin Finger print, Adler, and SHA1 implemented each one as a fingerprinting and hashing algorithm and compared the execution time and deduplication elimination ratio.

DATA DEDUPLICATION SYSTEM USING TTTD CHUNKING ALGORITHM
Hashing and Indexing
Matching
PROPOSED SYSTEM AND METHOD
RESULTS AND DISCUSSION
Evaluation Metrics
CONCLUSIONS AND FUTURE WORK

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.