Abstract

Recent years have seen a rapid growth in the number of virtual machines and virtual machine images that are managed to support infrastructure as a service (IaaS). For example, Amazon Elastic Compute Cloud (EC2) has 6,521 public virtual machine images. This creates several challenges in management of image files in a cloud computing environment. In particular, a large amount of duplicate data that exists in image files consumes significant storage space. To address this problem, we propose an effective image file storage technique using data deduplication with a modified fixed-size block scheme. When a user requests to store an image file, this technique first calculates the fingerprint for the image file, and then compares the fingerprint with the fingerprints in a fingerprint library. If the fingerprint of the image is already in the library, a pointer to the existing fingerprint is used to store this image. Otherwise this image will be processed using the fixed-size block image segmentation method. We design a metadata format for image files to organize image file blocks and a new MD5 index table of image files to reduce their retrieval time. The experiments show that our technique can significantly reduce the transmission time of image files that have already existed in storage. Also the deletion rate for image groups which have the same version of operating systems but different versions of software applications is up about 58%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.