Abstract

As the volume of data increases every day, it has become very difficult to manage storage devices to handle this explosive development of digital data. Deduplication plays a crucial role to remove redundancy in large-scale cluster storage space. Existing deduplication research using overlapping algorithms is working inefficiently in a lot of situations—it absorbs high memory and uses a lot of processing time. Real-time data is repeatedly incomplete, conflicting, and/or missing in certain behaviors or trends, and often includes significant errors. In the deduplication process, data pre-processing is a method which involves transforming raw data into a comprehendible format which is easy to analyze in terms of duplication data. So, data deduplication clusters have been accepted in data storage systems for records and data backup. Most of the researchers in this field are focused on data deduplication clusters, to reduce replica data in order to improve server memory. Especially popular is the pattern-matching deduplication clustering process. In this chapter, the overlapping algorithm and how the proposed multi-level pattern-matching algorithm (MLPMA) works for deduplication with large amounts of data and higher efficiencies is discussed. This technique of combining similarity with locality is achieved by applying a Bloom filter to the deduplication cluster for efficient data removal, which moves toward exploiting data redundancy. As an end result, in the deduplication scenario this technique is significant in improving the efficiency of the data deduplication ratio and throughput. To conclude, the evaluations show that the deduplication method has excellent performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.