Abstract

Deduplication is maturing as a standard attribute on backup and archives, whereby the aim is to free the storage space by removing the duplicates. Considering the storage room demand and justifiable deletion, this paper proposes a multi-container intelligent deduplication image archive system; where analogous images are disposed from the system based on the similarity approach. Similarity-aware image deduplication is achieved by calculating image fingerprints and the images are deleted when their hamming distance matches the predefined threshold. A probability model is addressed for the overall probability of getting similar images on the respective containers based on their relative storage and similarity scores of the images. In addition, the linear optimisation model is formulated to target data minimisation and storage space maximisation; which is further verified with the dataset. We perform experimentation of our work on the existing as well as synthesised datasets and various accuracy metrics are calculated in terms of precision, recall, f-score and deduplication ratio. It is observed that binary hashes used in our system give the fair contribution in removing similar images.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.