While data deduplication is an important data compression technique that removes copies of repeated data to enhance storage utilization, security and privacy risks arise since sensitive or delicate user data are at risk to both insider and outsider attacks. A distinct negative factor to performance of the technique is data fragmentation, which not only slows down the restoration process but also leads to massive power consumption. In this paper, we address this problem from the perspective of data layout. The kernel point of our method is a novel RAID-5-based cross grouping data layout (CGDL). We introduce a selective deduplication algorithm (SDD) to perform data replication and restoration. A new CGDL-based disk scheduling algorithm (LDP) is also proposed that predicts location dependence to save energy by eliminating the redundant disk read/write operations. We evaluate our new method on the Linux MD (multiple device) driver modules. The experiments show that, under a 10 disks 3 groups storage configuration, our method drastically (by 20%) improves restoration efficiency with only 7.6% reduction on the deduplication ratio, while reducing 23% power consumption.
Read full abstract