Abstract
Data deduplication has been widely introduced to effectively reduce storage requirement of virtual machine (VM) images running on VM servers in the virtualized cloud platforms. Nevertheless, the existing state-of-the-art deduplication for VM images approaches can not sufficiently exploit the potential of underlying hardware with consideration of the interference of deduplication on the foreground VM services, which could affect the quality of VM services. In this paper, we present HPDV, a highly parallel deduplication cluster for VM images, which well utilizes the parallelism to achieve high throughput with minimum interference on the foreground VM services. The main idea behind HPDV is to exploit idle CPU resource of VM servers to parallelize the compute-intensive chunking and fingerprinting, and to parallelize the I/O-intensive fingerprint indexing in the deduplication servers by dividing the globally shared fingerprint index into multiple independent sub-indexes according to the operating systems of VM images. To ensure the quality of VM services, a resource-aware scheduler is proposed to dynamically adjust the number of parallel chunking and fingerprinting threads according to the CPU utilization of VM servers. Our evaluation results demonstrate that compared to a state-of-the-art deduplication system for VM images called Light, HPDV achieves up to 67% deduplication throughput improvement.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.