AIR-IA: an analogous image removal approach using the intelligent archive

Jagdish Bakal,Jyoti Malhotra

doi:10.1504/ijac.2020.10037083

Abstract

Deduplication is maturing as a standard attribute on backup and archives, whereby the aim is to free the storage space by removing the duplicates. Considering the storage room demand and justifiable deletion, this paper proposes a multi-container intelligent deduplication image archive system; where analogous images are disposed from the system based on the similarity approach. Similarity-aware image deduplication is achieved by calculating image fingerprints and the images are deleted when their hamming distance matches the predefined threshold. A probability model is addressed for the overall probability of getting similar images on the respective containers based on their relative storage and similarity scores of the images. In addition, the linear optimisation model is formulated to target data minimisation and storage space maximisation; which is further verified with the dataset. We perform experimentation of our work on the existing as well as synthesised datasets and various accuracy metrics are calculated in terms of precision, recall, f-score and deduplication ratio. It is observed that binary hashes used in our system give the fair contribution in removing similar images.

Full Text