Abstract

Deduplication technology can obtain higher space utilization by keeping only one duplicate. But in a distributed storage system, the overall deduplication ratio will be limited due to redundancy elimination across nodes. The traditional deduplication methods usually utilize data similarity and data locality to improve the deduplication ratio. However, higher system overhead is caused by frequent similarity calculations. To deal with this problem, this paper proposes a new Feature-Aware Stateful Routing method (FASR), aiming to reduce the system overhead and keep a high deduplication ratio in the distributed environment. Firstly, we design a feature-aware nodes selection strategy to choose similar nodes by extracting data feature and data distribution characteristics. This strategy will save the similarity calculation with the nodes that are not similar to the data. Then, we present a stateful routing algorithm to determine the target node using super-chunk and handprint technology. Meanwhile, the algorithm maintain load balance of the entire distributed system. Finally, the data is deduplicated locally based on similarity index and fingerprint cache. Extensive experiments demonstrate that FASR can reduce system overhead by around 30% at most and also effectively obtain a higher deduplication ratio.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.