Abstract

DeDuplication is the technique of data reduction by breaking streams of data down into very granular components, and storing only the first instance of data items on the destination media and all the other similar occurrences to an index. Hash values are computed to identify the similar data items. Fixed size chunking (FSC) is a DeDuplication algorithm which breaks the data into fixed size chunks or blocks from the beginning of the file. But the main disadvantage of this technique is that, if new chunks are added in front or in the middle of a file, remaining chunks will get shifted from its initial position. This will yields a new hash value to the resulting chunks and thereby less DeDuplication ratio. But we can overcome this drawback by calculating hash values of chunks from the beginning as well as from the end of file and storing both values to metadata table. A new algorithm 'Dual Side Fixed Size Chunking' is proposed to get the high DeDuplication ratio over existing FSC. Without using computationally expensive Variable size chunking or content defined chunking, this algorithm can be effectively used for video or audio files to achieve a better DeDuplication ratio. This data reduction will provide network bandwidth savings and the ability to store more data on a given amount of disk or cloud storage. Reduced storage requirements will result in lower storage management and energy costs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call